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Special  Issue  on 

Neural  Network  Applications  in  Electromagnetics 

Guest  Editor  Introduction 

Neural  computing  and  machine  learning  algorithms  have  arrived  and  are  here  to  stay!  In 
the  last  ten  years  neural  networks  have  experienced  an  unbelievable  growth,  both  in  terms  of 
novel  neural  network  architectures  that  have  appeared  in  the  literature,  and  new  applications 
where  neural  networks  have  been  used  successfully.  The  high-speed  capabilities  and  “learning” 
abilities  of  neural  networks  can  be  applied  to  quickly  solving  numerous  complex  optimization 
problems  in  electromagnetics,  and  this  special  issue  shows  you  how.  Even  if  you  have  no 
background  in  neural  networks,  the  papers  that  appear  in  this  issue  will  give  you  a  flavor  of  the 
different  applications  that  neural  networks  can  be  applied  to. 

In  the  past,  neural  networks  (NNs)  have  been  applied  to  modeling  and  design  of  antennas, 
microstrip  circuits,  embedded  passive  components,  semiconductor  and  optical  devices,  and  so 
on.  Today,  support  vector  machines  (SVM)  have  also  emerged  in  the  area  of  machine  learning 
and  have  been  used  mainly  in  the  area  of  pattern  recognition  and  classification.  In  this  issue,  two 
of  the  papers  discuss  a  machine  learning  approach  to  solving  electromagnetics  problems.  One  of 
them  compares  results  between  classical  neural  networks  and  SVM’s. 

There  are  basically  four  main  situations  in  which  NNS  and  SVMs  are  good  candidates  for 
use  in  electromagnetics. 

1.  When  closed  form  solutions  do  not  exist  and  trial  and  error  methods  are  the  only 
approaches  to  solving  the  problem  at  hand. 

2.  \^en  the  application  requires  real-time  performance. 

3.  When  faster  convergence  rates  and  smaller  errors  are  required  in  the  optimization  of 
large  systems. 

4.  When  enough  measured  data  exist  to  train  an  NN  or  an  SVM  for  prediction 
purposes,  especially  when  no  analytical  tools  exist. 

5.  When  they  can  be  used  in  conjunction  with  other  numerical  techniques  for 
enhancement  purposes. 

This  special  issue  includes  7  papers  all  of  which  are  very  different  yet  they  have  one  unifying 
factor  which  is  the  use  of  NNS  and  SVM  in  tackling  the  problem  at  hand.  The  paper  is  an 
example  of  how  both  neural  networks  and  support  vector  machines  can  be  used  to  classify  buried 
objects  (a  classification  problem).  The  second  paper  shows  how  neural  networks  can  be  used 
along  with  signal  processing  techniques  for  bio-medical  applications  and  sensors.  In  the  third 
paper  we  see  an  example  of  how  neural  networks  can  be  combined  with  equivalent  circuit 
formulations  and  other  approaches  for  modeling  of  multilayer  printed  circuits.  The  fourth  paper 
introduces  the  use  of  SVM  in  training  adaptive  array  antennas  for  determining  the  direction  of 
arrival  of  a  signal  (DOA).  The  paper  includes  a  brief  introduction  of  machine  learning  and 
support  vector  machines  and  how  results  compare  with  the  more  classical  existing  techniques. 
The  fifth  paper  demonstrates  how  measured  data  can  be  used  to  train  neural  networks  to  predict 


resonances  for  microstrip  antennas  at  different  frequencies  and  for  different  dimensions.  The 
sixth  paper  is  an  example  of  how  neural  networks  can  be  used  in  problems  where  no  closed- 
form  solutions  exist  such  as  the  estimation  of  target  orientation  using  measured  radar  cross 
section  data.  The  last  paper  is  a  unique  example  of  using  neural  networks  with  the  asymptotic 
waveform  evaluation  (AWE)  to  speed  up  the  analysis  of  the  method  of  moments.  This  combined 
approach  is  applied  to  the  solution  of  a  microstrip  antenna.  Also,  several  references  are  included 
in  each  paper  and  the  hope  is  that  the  reader  will  be  exposed  to  the  wide  range  of  applications 
that  are  possible  today  in  the  area  of  electromagnetics  using  neural  networks  and  machine 
learning! 

Finally,  I  wanted  to  thank  the  following  reviewers  for  helping  me  with  this  issue:  Chaouki 
Abdallah,  Michael  Cryssomallis,  Said  El-Khamy,  K.  C.  Gupta,  Kerim  Guney,  Nafatli  (Tuli) 
Herscovici,  Q.  J.  Zhang,  and  Ahmed  EL  Zooghby.  Special  thanks  go  to  Atef  Elsherbeni  for 
coming  up  with  the  idea  behind  this  special  issue  and  being  patient  and  very  helpful  along  the 
way! 


Christos  Christodoulou 

University  of  New  Mexico 
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A  Comparative  Study  of  NN  and  SVM-Based 
Electromagnetic  Inverse  Scattering  Approaches  to 
On-Line  Detection  of  Buried  Objects 


Salvatore  Caorsi\  Davide  Anguita^,  Emanuela  Bermani^,  Andrea  Boni^, 

Massimo  Donelli^  and  Andrea  Massa^ 

^  Dept,  of  Electronics,  University  of  Pavia,  Via  Ferrata  1, 1-27100  Pavia,Italy 
^  Dept,  of  Biophysical  and  Electronic  Eng.,  University  of  Genoa,  Via  Opera  Pia  11  A,  1-16145,  Genova,Italy 
^  Dept,  of  Information  and  Communication  Tech.,University  of  Trento,  Via  Sommarive  14, 1-38050  Trento,Italy 


Abstract — Microwave-based  sensing  techniques  consti¬ 
tute  an  important  tool  for  the  detection  of  buried  targets. 
In  this  framework,  a  key  issue  is  represented  by  real¬ 
time  scatterer  localization.  As  far  as  such  a  topic  is 
concerned,  this  paper  presents  a  comparative  evaluation 
of  the  performances  provided  by  a  conventional  NN- 
based  inverse  scattering  technique  and  by  a  new  SVM- 
based  electromagnetic  approach.  In  order  to  estimate 
the  effectiveness  values  of  the  two  methods,  realistic 
configurations  and  noisy  environments  are  considered 
and  current  capabilities,  as  well  as  potential  limitations, 
are  pointed  out.  Finally,  possible  future  research  work  is 
outlined. 


I.  INTRODUCTION 

The  detection  of  buried  objects  by  means  of  interro¬ 
gating  electromagnetic  waves  is  usually  a  very  difficult 
task.  The  addressed  problem  is  nonlinear,  due  to  the 
relation  between  unknown  quantities  (object  parame¬ 
ters  and  field  distribution)  and  problem  data,  it  is  ill- 
posed  and,  generally,  only  aspect-limited  measures  are 
available. 

In  the  past  few  years,  considerable  efforts  have  been 
devoted  to  dealing  with  detection  or,  more  generally, 
reconstruction  problems,  and  several  approaches  have 
been  proposed.  As  far  as  weak  scatterers  are  con¬ 
cerned,  linearized  procedures  have  been  applied  (see 
[1],  [2],  [3]  and  references  cited  therein).  The  use  of 
closed  forms  of  the  scattering  equations  (based  on  the 
diffraction  theorem)  and  of  the  Fast  Fourier  Transform 
(FFT)  has  made  it  possible  to  obtain  faster  process¬ 
ing  rates  and  real-time  imaging.  Moreover,  numerical 
procedures  based  on  higher-order  Bom  approximations 
have  demonstrated  their  effectiveness  [4],  [5]. 

On  the  contrary,  nonlinear  algorithms  must  be  used 
when  strong  scatterers  are  considered.  The  retrieval 
problem  is  usually  recast  into  an  optimization  one  and 
is  effectively  solved  with  minimization  techniques  [6]- 
[10].  Unfortunately,  large  computational  resources  and 
a  high  computational  load  are  needed,  thus  making 


these  techniques  impracticable  (particularly  when  se¬ 
rial  implementations  are  realized)  if  real-time  perfor¬ 
mances  are  required. 

In  order  to  speed  up  the  detection  process,  a  key 
point  is  the  reduction  in  the  number  of  unknowns.  To¬ 
ward  this  end,  a-priori  information  (if  available)  on  the 
scenario  under  test  can  be  very  useful.  This  concept  has 
been  successfully  exploited  in  inverse  methodologies 
based  on  artificial  neural  networks  (NNs)  (see  [11] 
(pp.  475-479)  and  references  cited  therein).  As  far 
as  detection  problems  are  concerned,  methods  based 
on  both  multilayered-perceptron  [12],  [13]  and  radial- 
basis-function  [14]  neural  networks  have  demonstrated 
their  capabilities  for  on-line  retrieving  of  buried  cylin¬ 
drical  scatterers. 

Though  NN-based  approaches  have  generally  of¬ 
fered  good  performances  in  solving  inverse-scattering 
problems,  they  still  suffer  from  several  drawbacks  not 
completely  solved  up  to  now.  From  the  inductive- 
theory  point  of  view,  the  main  drawback  is  the  difficult 
control  of  the  complexity  of  underlying  NN  models. 
By  the  term  complexity  it  is  usually  meant  the  capacity 
of  a  learning  machine  to  fit  the  input  data.  Briefly,  if 
a  machine  is  too  complex,  it  will  typically  overfit  the 
data,  thus  losing  the  property  of  generalization  for  new 
measures  not  included  in  the  training  set.  If  complexity 
is  too  low,  the  machine  will  fail  to  correctly  interpret 
the  underlying  relations  among  training  samples.  The 
complexity  of  a  learning  machine  depends  on  many 
factors.  In  the  case  of  NNs,  the  numbers  of  hidden 
layers  and  neurons,  the  number  of  interconnections, 
and  the  learning  algorithm  used  for  the  training  process 
[15]  are  the  predominant  parameters.  Unfortunately, 
NNs  lack  an  effective  theory  suggesting  the  most 
suitable  NN  topologies  and/or  calibration  parameters. 
An  NN  adapts  its  internal  parameters  (i.e.,  the  weights) 
automatically  in  order  to  best  approximate  the  available 
training  data,  but  the  topology,  the  transfer  function 
and  the  other  parameters  are  heuristically  selected.  At 
present,  there  is  no  good  way  to  determine  how  many 
hidden  layers  or  how  many  hidden  nodes  each  layer 
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should  contain,  given  the  sample  data  with  which  to 
train  the  NN  for  the  solution  of  a  given  problem.  From 
the  computational  and  optimization  points  of  view, 
NNs  exhibit  drawbacks  as  well.  The  learning  process 
of  an  NN  eonsists  in  solving  a  nonlinear  optimization 
process.  Therefore,  any  conventional  optimization  al¬ 
gorithm,  including  the  widely  used  back-propagation 
procedure,  leads  to  reach  a  solution  that  corresponds  to 
one  of  the  local  minima  of  the  target  cost  function.  An 
empirical  way  to  face  such  a  problem  is  to  train  several 
NNs  with  different  starting  points,  thus  overloading  the 
optimization  process.  However,  this  choice  might  result 
in  inability  to  unambiguously  evaluate  statistical  and 
systematic  errors  on  neural  computations. 

A  possibility  to  overcome  these  drawbacks  is  based 
on  new  results  in  Statistical  Learning  Theory  (SLT) 
[16]  which  lead  to  new  algorithmic  paradigms  and 
new  computational  architectures  that,  though  still  based 
on  the  NN  model,  entirely  relinquish  their  biological 
plausibility  to  achieve  a  firm  theoretical  background. 
SLT  allows  one  to  derive  statistical  and  algorithmic 
properties  that  can  limit  or  avoid  altogether  the  NN 
problems  previously  described.  One  of  the  main  con¬ 
tributions  in  this  field  has  been  provided  by  Vladimir 
Vapnik  [16],  who  has  formulated  and  formalized  the 
inductive  rules  that  regulate  the  learning  process  by 
example  principles.  On  the  basis  of  these  fundamen¬ 
tals,  a  new  learning  paradigm,  called  Support  Vector 
Machine  (SVM),  has  been  developed.  After  initial 
studies,  SVMs  are  now  successfully  applied  in  several 
fields  ranging  from  pattern  recognition  to  function 
approximation  tasks.  From  a  theoretical  point  of  view, 
SVMs  turn  out  to  be  very  appealing,  as  compared  with 
conventional  NNs,  for  the  following  two  basic  reasons: 

•  the  constrained-quadratic  structure  of  the  opti¬ 
mization  problem  solved  for  the  learning  process; 

•  the  solid  statistical  theory  on  which  SVMs  are 
based. 

In  this  paper,  the  detection  of  buried  objects  by  means 
of  interrogating  electromagnetic  waves  is  addressed 
by  using  an  inductive  approach.  Within  the  frame¬ 
work  of  electromagnetic  retrieval,  the  effectiveness  and 
limitations  of  the  SVM-based  strategy  are  evaluated 
and  a  comparative  study  versus  conventional  NN-based 
methods  is  made.  Finally,  selected  numerical  results 
on  realistic  configurations  and  noisy  environments  are 
reported  and  discussed. 

11.  Mathematical  Formulation 

Let  us  consider  the  problem  of  determining  the 
unknown  parameters  of  an  object  buried  in  a  homo¬ 
geneous  soil.  With  reference  to  a  two-dimensional  ge¬ 
ometry,  let  Ds  be  a  lossy  region  with  complex  contrast, 
rs  =  [es  -  1]-J  enclosing  a  circular  cylindrical 
scatterer  of  radius  ps-  The  dielectric  properties  of  the 


Fig.  I.  Geometry  of  the  problem 

object  are  homogeneous,  tb,  and  the  dielectric  profile 
of  the  geometry  under  test  (Fig.  1)  can  be  described 
as  follows; 


^  "^0 

if  y  >  Los 

(  0  <  X  <  xb  +  PbcosB 

tb  i 

f  1  0<y<j/B+  PBsinO 

[  0  <  e  <  27r 

[  TS 

otherwise 

(1) 

This  scenario  is  illuminated  by  multiple  transmitters 
lying  on  Do  and  located,  in  the  upper  half  space, 
at  the  positions  t  =  The  probing 

fields,  E"''ix,  y),  are  radiated  in  the  free  space  and  at 
a  fixed  frequency  by  a  known  distribution  of  current 
filaments  parallel  to  the  z-axis.  The  same  probes  work 
as  receivers  for  the  anomalous  field. 

Under  these  hypotheses,  the  addressed  inverse  scat¬ 
tering  problem  can  be  mathematically  stated  as  fol¬ 
lows.  Starting  from  the  knowledge  of  the  anoma¬ 
lous  field,  S*"',  collected  at  the  receiver  positions 
T  ~  1,. ..,/?} 

E^‘>^{xr,yr\xt,yt)  =  E'”-'={xr,yr\xt,yt)+ 

+k^  fjy  Es{x,y\xt,yt)Gs{xr ,yr\x, y)T  (x, y)  dxdy 

(2) 

determine  the  set  of  unknown  parameters 
{{xB,yB),PB,TB)  defining  the  scatterer  under 
test.  In  eq.  (2): 

•  E'^’'{Xr,yr\Xt,yt)  =  E"''{Xr,yT\Xt,yt)  + 
E''^^ (xr,yT\xt,yt)  is  the  electric  field  at  the 
receivers  in  the  absence  of  the  object; 

.  E'^^^ {xr,yr\xt,yt)  is  the  electric  field  reflected 
by  the  planar  interface  at  the  receivers; 

•  Es{x,y\xt,yt)  is  the  electric  field  induced  inside 
the  reconstruction  domain  Dg  when  it  contains 
the  object; 

•  Gs{xr,yr',x,y)  is  the  Sommerfeld-Green  func¬ 
tion  for  the  half-space  geometry  [6]. 
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Then  the  solution  of  the  addressed  inverse  scattering 
problem  requires  the  determination  of  the  nonlinear 
function,  defined  as  follows 

X  =  $  {E^}  (3) 

,  where  x  is  the  “scatterer  array”  (x  =  [Xp'iP  -  1.  -i  ^1 
=  [{xB,yB),PB,TB],  p  being  the  number 
of  unknown  parameters)  and  is  the  data 

array  defined  as  =  [E^°H^r,yr\xt,yty, 

r  =  t  =  This  is  a  regression 

problem  in  which  the  unknown  function  ($) 
must  be  approximated  by  the  knowledge  of  a 
number  of  known  input-output  pairs  of  vectors 

III.  Learning-by-Examples  Strategies  for 
Inverse  Scattering  Problems 

The  inverse-scattering  problem  described  in  Section 
II  can  be  addressed  in  several  ways.  From  a  math¬ 
ematical  point  of  view,  the  key  issue  is  to  find  an 
approximation,  $,  for  the  unknown  function  $  on  the 
basis  of  a  set  of  samples  {(Enifin);  n  =  1, 
and  e„  being  an  input  pattern  (i.e.,  a  data  array  = 
E‘°*)  and  the  corresponding  target  (i.e.,  a  scatterer 
array  e„  =  (x)  ).  respectively.  This  is  a  typical 

learning-by-examples  problem,  which  is  usually  faced 
in  the  presence  of  unknown  systems  with  measurable 
input/output  signals.  In  the  following,  two  approaches 
based  on  a  multilayer  perceptron  (MLP)  neural  net¬ 
work  and  on  an  SVM,  respectively,  will  be  presented. 

A.  MLP-NN  Approach 

Neural  networks  are  distributed  computational  sys¬ 
tems  characterized  by  a  multi-layered  structure  of 
neurons  fully  interconnected  by  weighted  links.  MLP- 
NNs  can  be  considered  as  universal  approximators  for 
any  function  $  :  0?^^^  -4  [17].  Therefore,  they 

are  suitable  for  facing  with  regression  problems  char¬ 
acterized  by  complex  nonlinear  relations  between  data 
and  unknowns,  such  as  inverse  scattering  or  buried- 
object  detection  problems.  In  this  context,  is 

the  space  of  arrays  representing  measurement  data,  and 
5?^  is  the  space  of  unknown  parameters  describing  a 
buried  object. 

MLP-NN  theory  [II]  suggests  approximating  $  by 
a  nonlinear  function  of  the  weighted  measurement  data 

$(e)  =  +^l}  (4) 

where  L  is  the  number  of  layers;  ^  = 

$  +h]>^  =  being  ® 

is  the  nonlinear  activation  function  (e.g.,  a  sigmoid); 

and  bj  are  the  weight  matrix  and  the  bias 

array  of  the  I-th  layer,  respectively.  Given  known 
input-output  pairs  of  vectors  (called  training  set). 


Ft.aininp  =  {(e„,  ej;  n  =  1, ...,  iV},  and  according 
to  a  backpropagation  algorithm,  weights  and  biases 
are  computed  by  minimizing  the  error  function  ip 

iP  {^('-1-'),^;  I  =  1,  ...,l}  =  ||sn  -  5  (l„)|| 

(5) 

by  a  gradient  descent  procedure. 

Therefore,  the  direct  solution  of  the  inverse¬ 
scattering  problem  is  avoided,  and  real-time  (after  the 
training  phase)  solutions  to  buried-object  localization 
are  obtained  [13].  However,  as  the  error  function 
(5)  is  non-convex,  one  of  the  main  drawbacks  of 
the  approach  is  the  presence  of  local  minima  where 
the  optimization  algorithm  might  be  trapped  and  the 
solution  of  which  would  have  no  physical  significance. 

B.  SVM-Based  Approach 

In  order  to  avoid  the  drawbacks  of  the  NN-based 
inverse  scattering  method  related  to  the  nonlinear  fit¬ 
ting  of  the  training  samples,  an  SVM-based  approach  is 
presented.  The  underlying  idea  of  the  SVM  procedure 
is  to  split  the  approximation  for  the  nonlinear  function 
$  into  two  steps.  Instead  of  performing  a  nonlinear 
fitting  in  the  input  space,  a  nonlinear  mapping  of  the 
training  samples  from  the  input  space  into  a  larger 
(possibly  infinite)  space  (i.e,  the  feature  space,  K’’) 
is  first  performed.  Then,  a  simple  linear  fitting  is 
carried  out  in  the  new  space,  thus  avoiding  typical 
nonlinear-fitting  drawbacks  and  keeping  the  advantages 
of  a  linear  approach.  Moreover,  by  exploiting  some 
mathematical  properties  of  nonlinear  mappings,  the 
evaluation  of  the  data  in  the  feature  space  is  not 
required,  as  the  SVM  does  not  have  to  explicitly  work 
in  this  space. 

In  more  detail,  each  data  array  is  mapped  into 
the  feature  space  through  a_nonlinear  transformation 
p  ;  3?^^^  -F  3?^  with  T  R  X  T.  Then,  the 
samples  in  the  feature  space  are  linearly  interpolated 
by  defining  a  hyperplane  with  a  normal  vector  w_.  Thus, 
the  approximating  function  is  given  by 

^  {u)  =  w-  ip{u)  +  b  (6) 

Among  all  possible  hyperplanes,  SVMs  find  the  one 
that  corresponds  to  a  function  $  having  at  most  a 
deviation  e  from  each  target  (*),  for  all  the 
measures  u„,  and  that,  at  the  same  time,  is  as  “flat” 
as  possible.  As  it  is  impossible  for  all  the  points  to 
lie  inside  the  e  band,  some  errors  (^n,  called 

slack  variables)  are  allowed  and  linearly  weighted. 
Mathematically,  this  description  leads  to  a  constrained 

(*)  As  up  to  now  it  has  been  possible  to  synthesize  only  single¬ 
output  SVM,  we  refer  to  the  estimation  of  a  single  scatterer  array 
component  =  (Xp)ni  P  =  1>  ■ 


68 


ACES  JOURNAL,  VOL.  18,  NO.  2,  JULY  2003,  SI:  NEURAL  NETWORK  APPLICATIONS  IN  ELECTROMAGNETICS 
quadratic  optimization  problem  (CQP)  where  the  reg-  implicitly  with  nonlinear  mappings  through  the  use  of 
ularized  cost  function  7  Kernel  functions 


1  ^  ^ 

+  +  (7) 

is  minimized  over  w  and  b,  subject  to  the  following 
constraints: 

-  w-  -  b  <  e  + 

*  w-^(u)  +  b  —  Cn^  <  e  +  V n  (8) 

^n.C>0 

The  function  7  is  composed  of  two  terms.  The  first 
forces  the  hyperplane  to  be  as  flat  as  possible,  and 
the  second  penalizes  the  deviation  of  each  target  from 
the  function  $.  The  constant  C  measures  the  tradeoff 
between  the  two  terms.  It  can  be  shown  that  this 
approach  can  be  used  to  control  the  complexity  of 
the  learning  machine,  according  to  the  Structural  Risk 
Minimization  principle  [16].  This  principle  guarantees 
a  considerable  generalization  ability  of  the  model,  and 
provides  upper  bounds  to  such  ability,  albeit  in  a 
statistical  framework.  It  is  also  interesting  to  note  that 
this  formulation,  which  derives  from  SLT,  resembles 
closely  the  regularization  approach  that  is  usually 
exploited  when  dealing  with  ill-posed  problems,  like 
inverse  ones  [18]. 

The  problem  defined  by  eqs.  (7)-(8)  is  then  rewritten 
in  dual  form  by  using  the  Lagrange  multiplier  theory. 
By  introducing  2N  Lagrange  multipliers,  a„,a*,n  = 
a  dual  functional,  jd,  to  be  maximized  is 
obtained  (see  [19]  or  [16]  for  more  mathematical 
details): 

'id{a,QC]  = 

{-5  Elj=i  )  («J  -  “j')  ^  + 

-cEli  K  +  <)  +  Eti  («n  -  0} 

(9) 

subject  to 

f;(a„-a;)-0  a„,a;e[0,C]  (10) 

n=l 


7{w,6} 


k{iLi,Ej)  (13) 

The  theory  of  kernels,  that  is,  the  conditions  under 
which  equation  (13)  holds,  has  been  known  since  the 
beginning  of  the  last  century;  it  is  based  on  Mer¬ 
cer’s  theorem  [16]  and  has  been  applied  to  pattern 
recognition  tasks  since  the  ’60s  [20].  However,  only 
recently  has  the  connection  with  learning  machines 
been  well  formalized  [18].  Kernel  functions  are  pos¬ 
itive  semidefinite  functionals.  Therefore,  using  this 
property  and  the  fact  that  the  constraints  of  the  above 
optimization  problem  are  “affine”,  any  local  minimum 
is  also  a  global  one,  and  algorithms  exist  by  which 
the  solution  can  be  found  in  a  finite  number  of  steps 
[21].  Furthermore,  if  the  kernel  is  strictly  positive 
definite  (that  is  always  the  case,  except  in  pathological 
situations),  the  solution  is  also  unique.  These  properties 
overcome  many  typical  drawbacks  of  classical  neural- 
network  approaches,  such  as  the  determination  of  a 
suitable  minimum,  the  choice  of  the  starting  point,  the 
optimal  stopping  criteria,  and  so  on. 

Since  the  publication  of  early  seminal  works  on  ker¬ 
nel  functions,  many  functionals  have  been  found  that 
satisfy  Mercer’s  theorem.  As  far  as  inverse-scattering 
problems  are  concerned,  a  Gaussian  kernel 

l|2l 


k  (Ei,Ej)  =  exp 


2cr2 


(14) 


performing  a  mapping  in  an  infinite-dimensional  fea¬ 
ture  space  [18]  and  preliminarily  used  in  [19],  has 
demonstrated  its  effectiveness. 

Concerning  the  SVM  parameters,  the  threshold  b 
is  computed  by  means  of  the  Karush-Khun-Tucker 
conditions  of  the  CQP  at  optimality  [19],  and  the 
hyper-parameters  of  the  problem  (cr,  q,  C  and  e)  are 
deduced  by  accomplishing  the  model-selection  task 
proposed  in  [22]. 

Finally,  the  CQP  is  solved  by  a  standard  optimiza¬ 
tion  algorithm,  namely,  Platt’s  SMO  algorithm  for 
regression  [23]. 


IV.  Numerical  Results 


l!L='£^{an-a*^)^{En)  (U) 

n—1 

Consequently,  $  is  equal  to 
N 

=  £{e„)  ■  ¥>  (e)  +  b  (12) 

n— 1 

where  only  the  inner  product  of  the  nonlinear  mapping 
function  (and  not  the  function  itself)  appears.  This  is 
the  well-known  kernel  trick  that  allows  one  to  deal 


In  this  work,  a  comparative  study  of  NN  and  S VM- 
based  approaches  is  made  concerning  the  localization 
of  a  scatterer  buried  in  the  soil.  Let  us  consider  a 
square  investigation  domain  Ls  —  A-sided  (A  being 
the  ffee-space  wavelength)  completely  embedded  in 
the  ground.  The  relative  permittivity  of  the  soil  is 
es  =  20.0  and  the  conductivity  is  as  =  0.01  The 
center  of  the  region  under  test  is  Los  =  deep.  The 
buried  object  is  a  lossless  circular  cylinder  of  radius 
Pb  =  iA  and  the  relative  permittivity  of  the  ground 
is  equal  to  £5  =  5.0.  This  scenario  is  illuminated  by 
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Fig.  2.  Training  set.  Geometrical  arrangement  of  the  center  of  the 
cylinder  under  test 
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Fig.  3.  Test  set.  Geometrical  arrangement  of  the  center  of  the 
cylinder  under  test 


an  electric  line  source,  located  in  the  upper  region  with 
the  coordinates  Xt  =  0  and  yt  =  f  A,  t  =  T  =  1,  and 
parallel  to  the  air-soil  interface.  The  anomalous  field 
is  collected  at  il  =  16  measurement  points  equally 
spaced  (d  =  and  located  on  a  line  placed  close  to 
the  air-soil  interface  (Lst  = 

The  data  used  to  generate  the  training  set  and 
those  used  to  test  the  SVM  approach,  as  well  as 
the  MLP  neural  network,  were  obtained  synthetically 
by  a  Finite  Element  code  and  a  PML  technique 
[24].  During  the  learning  phase,  the  training  set, 
{Ftrainins,  I  N  =  729),  was  obtained  by  moving  the 
center  of  the  cylinder  inside  Ds  among  the  positions 
shown  in  Figure  2  and  collecting  the  anomalous  field 
at  the  receiver  positions.  As  far  as  the  test  phase  is 
concerned,  M  =  84  randomly  chosen  locations  of  the 
scatterer  (Fig.  3)  were  considered  in  order  to  define 
the  test  set  Ftejt  =  {(Em>£m)i  ^ 

additive  Gaussian  noise,  characterized  by  the  signal- 
to-noise  ratio  (SNR) 


(15) 


*^noi«e  lis'tJg  lit®  variance  of  noise,  affected  the  mea¬ 
sured  signals. 

A  two-layer  MLP-NN  [12],  characterized  by  32 
inputs,  32  hidden  neurons,  and  2  output  neurons,  was 
trained  by  using  a  delta-bar-delta  back  algorithm  [25] 
in  order  to  overcome  the  shortcomings  of  the  gradient- 
descent  procedure  and  to  increase  the  convergence  rate 
of  the  standard  back-propagation  learning  algorithm. 

Concerning  the  SVM-based  approach,  two  SVMs 
were  used  to  estimate  the  center  coordinates  of  the 
buried  object.  Moreover,  after  the  optimal  selection 
procedure,  the  values  of  the  SVM  hyperparameters 
turned  out  to  be  constant  quantities  equal  to  (C)  = 

(O  =  100  and  e  =  0.001.  On  the  contrary, 
the  variance  values  of  the  Gaussian  kernels, 
and  ,  were  determined  independently  of  each 

scenano  under  test. 


A.  Definitions 

In  order  to  quantitatively  estimate  the  effectiveness 
of  the  presented  approaches,  some  error  values  are 
defined.  Let  us  introduce  the 
•  local  errors  on  the  center  location,  (5“  and  : 


(5“  = 


= 


®oc*  *rec  I 


u  = 

v(u)  =  l,...,V(u) 
V  =  1,...,V; 
u(v)  =  l,...,U(v) 


(16) 

•  local  average  errors  on  the  object  localiza¬ 
tion,  Cx  =  {C“.  w  =  l . U)  and  Cy  = 

{g,v  =  l,...,V}: 


C  = 

Sa; 


1  y'V’(ii)  3.v(u) 
V(«)  2-^v(«)=l 


..V  1  ,/ 

Vaet  V{v)  ^u(v)=l 


y“'’'> 

Vrec 


<y  = - s;:;; - 

global  average  errors,  &x  and  0,,: 


u  =  1, ...,  U 

v  =  l . V 

(17) 


''(u)' 

^act  V(u) 

jyZL, 

^U(v) 

Vact 

( 

where  the  subscripts  rec  and  act  refer  to  the  estimated 
and  actual  coordinates  of  the  scatterer,  respectively; 
dmax  =  I'S  is  the  maximum  error  in  defining  the 
coordinates  of  the  center  of  the  circular  scatterer  when 
it  is  contained  in  the  investigation  domain,  Ds- 
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Fig.  4.  Reconstructed  data  versus  actual  data  (Noiseless  Case),  (a)- 

md)  ^ 

B.  Numerical  Assessment  -  Scenario  I 

The  first  example  is  aimed  at  evaluating  the  pos¬ 
sibility  of  locating  the  buried  object  starting  from 
the  knowledge  of  the  measured  electric  field  and 
assuming  the  knowledge  of  the  soil  characteristics 
to  be  a-priori  information  about  the  geometry  un¬ 
der  test.  Consequently,  the  incident  field  is  a  known 
quantity  and  the  data  array  is  defined  as  follows: 

=  [E*^°*^{xr,yT\xt,yt)  -  E"^''{xr,yr\xt,yty, 
r  =  t  =  1,  In  this  context,  the  SVM 

parameters  have  been  chosen  equal  to  =  0-64 

and  (cr^)  =  0.32. 

Figure  4  shows  the  localization  results  for  the  ex¬ 
amples  making  up  the  test  set  and  obtained  by  using 
the  MLP-NN  (Fig.  4(a)-(6))  and  the  SVM-based  proce¬ 
dure  (Fig.  4(c)-(c0).  respectively.  As  can  be  observed, 
a  good  accuracy  concerning  the  center  location  is 
achieved  along  the  two  reference  axes  and  for  both  the 
MLP-NN  and  SVM-based  approaches.  In  particular, 
even  if  the  detection  accuracy  decreases  as  the  distance 
from  the  air-soil  interface  increases,  good  localizations 
are  achieved  in  the  whole  domain,  as  confirmed  by 
the  statistics  shown  in  Table  I.  In  particular,  as  far  as 
the  scatterer  depth  estimation  is  concerned,  the  SVM 


(d) 

-(b)  MLP-NN  approach,  (c)-(d)  SVM-based  approach,  (a)-(c)^  and 


sharply  reduces  the  error  of  the  MLP-NN,  reaching 
an  average  error  equal  to  <  6^  >svm=  0.0584 
(<  5^  >MLP=  0.1004  being  the  average  error  made 
by  the  MLP-NN  approach).  Moreover,  it  should  be 
noted  that  the  time  required  for  the  SVM  training  is 
about  one  tenth  of  the  one  required  by  the  MLP-NN, 
whereas  there  is  no  significant  diflFerence  between  the 
computation  times  taken  by  the  two  approaches  for  the 
object  localization  (i.e,  after  the  learning  phase). 

TABLE  I 

Scenario  I  (Noiseless  Case).  Local  error  statistics 


■Slag 

MLP 

■tllMMI 

0.2098 

UQUjiil 

SVM 

0.0177 

mmvEsm 

■tmiaiijHi 

BtiEi 

HQlia 

■nurgiiM 

rffTHTM 

SVM 

Biffitrai 

In  order  to  assess  the  robustness  of  the  learning- 
based  retrieval  strategies,  a  noisy  environment  has 
been  taken  into  account.  For  all  the  simulations,  the 
buried  cylinder  and  the  electromagnetic  scenario  are 
unchanged  and  characterized  by  the  same  dielectric 
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Fig  5  Scenario  I  {Standard  Validation).  Local  average  errors  of  the  MLP-NN  and  SVM-based  procedures  for  different  signal-to-noise 
ratios:  (a)  SNR  =  5  dB,  (6)  SNR  =  10  dB,  (c)  SNR  =  20  dB,  (d)  SNR  =  35  dB,  (e)  SNR  =  50  dB,  and  (/)  SNR  =  100  dB 


properties  as  during  the  training  phase.  The  local  aver¬ 
age  errors  are  given  in  Figure  5.  For  different  signal-to- 
noise  ratios,  the  plots  of  Cx  and  Cy  related  to  both  the 
MLP-NN  and  the  SVM-based  procedures  are  shown. 
As  expected,  the  estimation  of  the  scatterer  depth  turns 
out  to  be  more  difficult  than  the  horizontal  detection. 
However,  the  performances  guaranteed  by  the  SVM 
procedure  generally  outperform  those  achieved  by  the 
MLP-NN  approach.  Concerning  the  dependence  of 
the  reconstruction  properties  on  the  SNR  value,  the 
scatterer  is  located  quite  correctly,  and  Cx  <  0.025 
whatever  the  noisy  case  considered.  Moreover,  Cy  is 
greater  than  0.05  only  in  the  interface  regions  (i.e., 
near  the  air-soil  interface  and  at  the  bottom  of  the 


investigation  area).  This  behavior  is  not  surprising,  as 
confirmed  by  the  experimental  results  reported  in  [26], 
where  the  problem  of  the  pollution  of  the  useful  signal 
due  to  the  reflections  of  the  air-ground  interface  is 
clearly  pointed  out. 


Another  evaluation  of  the  robusmess  of  the  proposed 
approaches  has  also  been  obtained  by  carrying  out 
the  so-called  cross  validation  test.  The  two  methods 
have  been  trained  with  a  noisy  data  set  (i.e.,  a  data 
set  whose  samples  are  related  to  an  assigned  signal- 
to-noise  ratio  SNRTraining)  and  tested  with  a  test  set 
computed  in  a  different  noisy  environment  (SNRrest)- 
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(c)  (d) 

Fig.  6.  Scenario  1  (Cross  Validation).  Global  average  errors:  (a)  {&x]mlp-nN’  W  {^v)mlp-nn>  and  (d) 


{^y)svM 

Figure  6  shows  a  color-level  representation  (*)  of  the 
global  average  errors  for  different  values  of  signal- 
to-noise  ratio  of  the  training  and  test  sets  ranging 
from  5dB  to  100  dB.  Figures  6(a)-(c)  and  6(b)-(d) 
refer  to  the  MLP-NN  approach  and  the  SVM  method, 
respectively.  As  expected,  the  smallest  values  of  the 
global  errors  are  reached  when  the  same  noisy  en¬ 
vironment  is  considered  for  both  the  training  and 
test  data-sets.  Otherwise,  the  SVM  method  always 
outperforms  the  MLP-NN  approach  in  the  estimation 
of  the  horizontal  coordinate  of  the  scatterer  (0i).  As 
far  as  the  depth  of  the  scatterer  location  is  concerned, 
similar  conclusions  can  be  drawn  for  the  region  defined 
by  the  following  ranges:  SNRTraining  >  10  dB  and 

*  The  two  pixels  at  the  right-bottom  angles  of  the  plots  indicate 
the  minimum  and  maximum  values  of  the  global  errors. 


SNRrest  >  10  dB.  Otherwise,  the  comparative  study 
does  not  provide  any  significant  information. 

C.  Numerical  Assessment  -  Scenario  2 

In  the  second  example,  a  more  complex  scenario 
has  been  preliminarily  considered.  No  information 
about  the  soil  is  available  and  the  problem  data 
are  the  measures  of  the  anomalous  field,  = 

[E*°*{xr,yr\xt,yt);  r  -l,...,R-,t  =  l,...,T]  .  As 
far  as  the  choice  of  the  hyperparameters  is  concerned, 
the  same  value  equal  to  0.04  has  been  assumed  for 
(cr^)  and  for  (a^) 

As  expected  (Fig.  7),  the  performances  of  the 
leaming-by-examples  strategies  considerably  reduce, 
as  compared  with  Scenario  1.  However,  the  higher 
effectiveness  of  the  SVM-based  procedure  than  that 
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SMR  •  100  dS 


(e)  (0 

Fig.  7.  Scenario  2  (Standard  Validation).  Local  average  errors  of  the  MLP-NN  and  SVM-based  procedures  for  different  signal-to-noise 
ratios  in  the  range  between  SNR  =  5  dB  and  SNR  =  100  dB 


of  the  MLP-NN  method  is  confirmed.  Starting  from 
SNR  =  20  dB,  the  local  error  values  turn  out  to  be 
smaller  than  0.15.  On  the  contrary,  the  performances 
of  the  MLP-NN  method  strongly  worsen,  as  indicated 
by  the  dashed  lines  in  Figure  7. 

V.  Conclusions 

In  this  paper,  two  inductive  methods  for  the  detec¬ 
tion  of  buried  objects  have  been  extensively  compared. 
Starting  from  an  integral  formulation  of  the  scattering 
equations,  the  buried-object  localization  has  been  re¬ 
formulated  as  a  regression  problem  and  successively 
solved  by  means  of  two  leaming-by-examples  strate¬ 
gies,  namely,  the  MLP-NN  approach  and  the  SVM- 
based  procedure.  The  estimation  of  the  effectiveness 


of  the  proposed  procedures  has  been  carried  out  in 
different  test  cases  that  have  clearly  confirmed  the 
higher  robustness  of  the  SVM-based  approach  in  solv¬ 
ing  difficult  approximation  problems  as  compared  with 
traditional  neural  networks.  Several  scenarios  have 
been  considered  and  the  behaviors  of  the  two  inductive 
models  have  been  illustrated  for  different  operating 
conditions.  The  obtained  results  have  demonstrated  the 
successful  application  of  the  SVM-based  procedure  to 
solve  inverse-scattering  problems.  Future  work,  cur¬ 
rently  under  development,  will  be  devoted  to  improving 
the  performances  of  the  SVM-based  procedure  and  to 
determining  customized  kernel  functions. 
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Abstract 

Learning  theories  and  algorithms  for  both  supervised  and 
unsupervised  Neural  Networks  (NNs)  have  already  been 
accepted  as  relevant  tools  to  cope  with  difficult  problems 
based  on  the  processing  of  experimental  electromagnetic 
data.  These  kinds  of  problenB  are  typically  fomiulated  as 
inverse  problems.  In  this  paper,  in  particular,  the 
electrical  signals  under  investigations  derive  from 
experimental  electromyogram  Interference  patterns 
measured  on  human  subjects  by  means  of  non-invasive 
sensors  (surface  ElectroMyoGraphic,  sEMG,  data).  The 
monitoring  and  the  analysis  of  dynamic  sEMG  data 
reveals  important  information  on  muscles  activity  and 
can  be  used  to  clinicians  for  both  preventing  dramatic 
illness  evolution  and  improving  athletes  performance. 
The  paper  proposes  the  use  of  Independent  Component 
Analysis  (ICA),  an  unsupervised  learning  technique,  in 
order  to  process  raw  sEMG  data  by  reducing  the  typical 
“cross-talk”  effect  on  the  electric  interference  pattern 
measured  by  the  surface  sensors.  The  ICA  is 
implemented  by  means  of  a  multi-layer  NN  scheme. 
Since  the  IC  extraction  is  based  on  the  assumption  of 
stationarity  of  the  involved  sEMG  recording,  which  is 
often  inappropriate  in  the  case  of  biomedical  data,  we 
also  propose  a  technique  for  dealing  with  non-stationaiy 
recordings.  The  basic  tool  is  the  wavelet  (time- 
frequency)  decomposition,  that  allows  us  to  detect  and 
analyse  time-varying  signals.  An  auto-associative  NN 
that  exploits  wavelet  coefficients  as  an  input  vector  is 
also  used  as  simple  detector  of  non-stationarity  based  on 
a  measure  of  reconstraction  error.  The  proposed 
approach  not  only  yields  encouraging  results  to  the 
problem  at  hand,  but  suggests  a  general  approach  to 
solve  similar  relevant  problems  in  some  other 
experimental  applications  of  electromagnetics. 

1.  Introduction 

Most  relevant  medical  problems  are  today  faced 
by  processing  (by  visual  inspection  or  some  automatic 
means)  electrical  signals  detected  on  the  human  body. 
Evaluation  of  patient  populations  often  includes  the  use 
of  ancillary  tests  for  diagnosis  and/or  prognosis.  Data 
sets  collected  from  these  diagnostic  tests,  such  as  the 
Electroencephalogram  (EEG),  the  Electromyogram 
(EMG),  the  Electrocardiogram  (ECG)  and,  more 
recently,  functional  Magnetic  Resonance  Imaging 
(fMRI),  tend  to  be  complex,  large  and  high-dimensional. 
The  trend  towards  digitization  of  the  traditionally  analog 


EEG,  EMG  and  ECG  signals  has  coincided  with  the 
development  of  computing  power  and  multivariate  signal 
processing  techniques  capable  of  manipulating  and 
analyzing  such  large  data  sets  [Akay  M.,1997]. 

The  use  of  Independent  Component  Analysis 
(ICA),  an  unsupervised  learning  technique  which 
generalizes  Principal  Component  Analysis  (PCA), 
commonly  implemented  through  Neural  Network  (NN) 
schemes,  is  proposed  in  this  study  to  process 
experimental  biomedical  data.  Applied  to  sEMG  (surface 
ElectroMyoGraphy)  data,  ICA  results  in  numerous 
spatially-independent  patterns,  each  associated  with  a 
unique  time-course,  providing  a  way  to  separate  different 
electrical  signals  coming  from  different  muscle  activities 
[Jung  T.P.,  2000].  In  contrast  to  the  variable  nature  of  the 
surface  EMG  recorded  from  a  single  muscle  in  isolation, 
ICA  of  the  sEMG  from  several  muscles  simultaneously 
allows  the  detection  of  highly  reproducible  components 
for  example  in  the  sEMG  of  the  face  and  the  throat 
during  swallowing  and  in  the  sEMG  of  arm  muscles 
during  reaching  movements  [McKeown  M.J.,  2002]. 

The  researches  reported  in  the  present  study 
show  important  applications  in  the  study  of  some 
neurological  diseases,  and  in  the  monitoring  of  athletic 
activities  for  improving  significantly  the  potential  of 
athletes  as  well  as  the  capabilities  of  normal  subjects  in 
daily  actions,  since  it  makes  it  possible,  in  principle,  to 
enhance  motor  coordination.  Also,  musculo-skeletal 
disorders  are  the  first  cause  of  patient-physician 
encounters  in  the  industrialized  countries  [IEEE 
Engineering  in  Medicine  and  Biology,  2001]. 

This  paper  is  organized  as  follows.  In  Section  2 
the  type  of  data  coming  from  electrical  activity  of 
muscles  will  be  discussed.  In  Section  3  we  shall  propose 
the  McKeown  idea  of  motion  through  integration  of  sub¬ 
movements  [McKeown  M.J.,  200b].  The  computational 
model  incorporating  sub-movements  will  be  presented  in 
Section  4.  Section  5  is  devoted  to  the  proposal  of  NN 
schemes  to  implement  ICA.  Section  6  will  report  the 
results  achieved  by  processing  the  experimental  data. 
The  assumption  of  stationarity  of  the  electrical  signals 
will  be  relaxed  in  Section  7,  where  the  wavelet  approach 
will  be  proposed.  Finally,  some  conclusions  are  drawn. 


2.  ElectroMyographic  Data 

When  skeletal  muscle  fibers  contract,  they 
conduct  electrical  activity  (action  potentials,  APs)  that 
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can  be  measured  by  electrodes  affixed  to  the  surface  of 
the  skin  above  muscles  [Akay  M.,  1997].  As  the  APs 
pass  by  the  electrodes,  spikes  of  electrical  activity  are 
observed  and  pulses  of  muscle  fiber  contractions  are 
produced.  Small  functional  groups  of  muscle  fibers, 
termed  motor  units  (MUs),  contract  synchronously, 
resulting  in  a  motor  unit  action  potential  (MUAP).  To 
sustain  force,  an  MU  is  repeatedly  activated  by  the 
central  nervous  system  several  times  per  second.  The 
repetition,  or  average,  firing  rate  is  often  between  5  and 
30  times  per  second  (or  faster).  The  electromyographic 
(EMG)  signal  is  widely  used  as  a  suitable  means  to  have 
access  to  physiological  processes  involved  in  producing 
joint  movements.  The  information  extracted  from  the 
EMG  signals  can  be  exploited  in  several  different 
applications.  The  typical  sensors  used  for  EMG  are 
needle  (unipolar  or  bipolar)  sensors.  The  experimental 
data  here  analysed  come  from  non-invasive  surface  EMG 
sensors,  that  present  the  cross-talk  effect,  i.e.,  they  detect 
electrical  activities  from  several  muscles  simultaneously 
in  action. 


3.  Sensorimotor  integration  of  sub-movements 

A  growing  body  of  evidence  suggests 
movements  which  appear  smooth  to  the  naked  eye  are 
actually  composed  of  the  temporal  and  spatial 
superposition  of  discrete  sub-movements  precisely 
recruited  and  coordinated  by  the  central  nervous  system 
[Harris  C.M.,  1998].  However,  the  spatial  and  temporal 
overlap  of  sub-movements  has  generally  made  it 
impossible,  with  the  common  computational  tools 
available  to  the  neuroscientist,  to  isolate  the  effects  of 
individual  sub-movements  [Sejnowski  T.J.,  1998]. 

Extensive  computational  expertise  is  required  to 
adequately  interpret  the  data  gleaned  from  the 
experiments.  Detection  of  non-stationarity  in  the  sEMG 
and  kinematic  variables  is  necessary  to  detect  the  onset 
of  temporally  overlapping  sub-movements.  We 
investigate  the  information-theoretic  considerations  of 
channel  capacity  and  bandwidth  as  important 
determinants  in  the  selection  and  sensorimotor 
integration  of  individual  sub-movements. 

4.  Computational  Models  incorporating  Sub¬ 
movements 

Some  computational  approaches  have  attempted 
to  model  reaching  movements  as  incoiporating  sub¬ 
movements;  however,  they  have  not  addressed  many  of 
the  unanswered  questions  regarding  the  characteristics  of 
sub-movements.  Others  have  attempted  to  model 
reaching  movements  without  considering  sub¬ 
movements  at  all.  Smoothness,  an  empirical  observation 
of  motor  movements,  has  often  used  as  a  cost  function  to 
optimise  the  models.  Rather  than  define  sub-movements 
on  the  basis  of  the  velocity  profiles,  in  this  project  the 
sub-movements  are  defined  on  the  basis  of  muscular 
activity.  Empirically,  experienced  physical  therapists 
describe  “efficiency”  of  motor  movements  as  subjects 


progressively  recover.  At  some  stage  of  rehabilitation, 
people  are  able  to  mimic  normal  kinematics  but  still 
complain  of  muscle  aching  and  fatigue  due  to  excessive 
muscle  co-contraetion. 

Intuitively,  sub-movements  are  groups  of 
muscles  that  have  the  tendency  to  activate  together 
following  a  common  neural  input  We  assert  that  a  sub¬ 
movements  is  “hard-wired”  by  adulthood,  in  the  sense 
that  it  may  be  encoded  in  the  spinal  cord  as  part  of  a 
Central  Pattern  Generator  (CPG),  and  also  partly  reflect 
the  anatomical  distribution  across  several  muscles  of  a 
single  nerve  root  exiting  the  spinal  eord.  To  suggest  a 
computational  model  of  sub-movements,  we  initially 
make  the  stationarity  assumption.  Since  the  EMG  is  an 
indirect  measure  of  the  neural  command  to  the  muscle, 
the  Mutual  Information  (MI)  can  be  used  as  a  metric  to 
infer  the  recordings  from  two  EMG  electrodes  contain 
common  neural  input  M.  McKeown  has  proposed  using 
ICA  for  the  analysis  of  sEMGs,  demonstrating  that  the 
Independait  Components  (ICs)  are  more  strongly 
coupled  with  ongoing  brain  rhythms  (EEG)  than  the 
sEMGs  recordings  of  individual  muscles  [McKeown 
M.J.,  2000a].  The  ICA  model  can  be  used  to  provide  a 
useful  starting  point  for  the  rigorous  definition  of  a  sub¬ 
movement  upon  which  more  elaborate  models  can  be 
created.  Consider  numerous  simultaneous  sEMG 
recordings  deriving  from  several  electrodes  distributed 
over  many  muscles  during  a  coordinated  cortically- 
controlled  movement.  If  we  model  the  sEMGs  recorded 
from  each  electrode  to  be  the  linear  superposition  of 
activity  from  different  group  of  muscles  (possibly 
encoded  with  CPGs)  that  tend  to  co-activate,  the,  the  goal 
is  to  estimate  the  cortical  modulation  of  the  commonly 
influenced  muscles.  A  single  sub-movement  is  defined  as 
m(t)  =  U  C(t),  t=t0->tn,  where  m  is  a  column  vector, 
with  mj  representing  the  muscle  electrical  activity 
contributing  to  the  jth  electrode  as  a  function  of  time,  U 
is  a  stationary  column  vector  representing  the  relative 
weighting  that  a  given  cortical  command  gives  to  the 
different  muscle  areas,  and  C(t)  is  the  unknown  scalar 
neural  command  over  time.  If  several,  e.g.  p,  sub¬ 
movements  during  a  complex  movement  are  temporally 
(and  spatially)  overlapping,  the  linear  combination  of 
ink(tk)  outputs  M(t),  the  total  muscle  electrical  activity 
over  the  duration  of  the  whole  movement  and  Mj  is  the 
electrical  activity  recorded  from  the  jth  electrode,  Ck 
represents  the  relative  activation  of  the  kth  sub¬ 
movement  by  an  independent  cortical  command,  and  the 
matrix  Ujj,  has  as  its  columns,  Ut,  the  vectors  defining 
the  different  sub-movements.  If  we  assume  that  for  a 
given  time-period,  say  T,  a  constant  number  of  sub¬ 
movements,  c,  are  simultaneously  active,  thus,  we  have 
M  =  UC,  where  M  is  the  matrix  of  the  electrical  activity, 
C  is  the  matrix  of  presumed  independent  cortical 
commands,  and  U  is  a  matrix  defining  the  sub¬ 
movements.  The  goal  is  then,  given  the  recordings  from 
the  electrodes,  and  not  knowing  U,  to  estimate  the 
different  cortical  influences,  C.  If  the  Ck  are  assumed  to 
be  independent  and  c  can  be  estimated,  this  is  possible 
through  the  ICA. 
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5.  Neural  models  of  JCA 

Independent  Component  Analysis  (ICA)  can 
easily  be  introduced  as  a  straightforward  evolution  of  the 
well-known  statistical  technique  referred  to  as  Principal 
Component  Analysis  (PCA).  Nevertheless,  it  is  also 
possible  to  investigate  the  main  ideas  behind  ICA  from 
the  perspectives  of  both  leaming/neural  systems  and 
signal  processing  (blind  source  separation).  A  good 
definition  of  ICA  can  be  found  in  [Lee  T.W.,  1998):  ICA 
is  a  method  for  finding  a  linear  non-orthogonal  co¬ 
ordinate  system  in  any  multivariate  data.  The  directions 
of  the  axes  of  this  co-ordinate  system  are  determined  by 
both  the  second  and  higher  order  statistics  of  the  original 
data.  The  goal  is  to  perform  a  linear  transformation 
which  makes  the  resulting  variables  as  statistically 
independent  from  each  other  as  possible.  In  contrast  to 
correlation-based  transformations  such  as  PCA,  ICA  not 
only  decorrelates  the  signals,  through  second-order 
statistics,  but  also  reduces  higher-order  statistical 
dependencies.  Blind  source  separation  by  ICA  has 
received  attention  because  of  its  potential  applications  in 
signal  processing.  Here,  the  goal  is  to  recover 
independent  sources  given  only  sensor  observation  that 
are  unknown  linear  mixtures  of  the  latent  (unobserved), 
possibly  independent,  source  signals.  In  parallel  to  blind 
source  separation  researches,  the  ICA  emerged  within  the 
framework  of  unsupervised  learning.  In  particular, 
Linsker  [Linsker  R.)  firstly  proposed  an  algorithm  based 
on  information  theory  that  was  then  used  to  maximize  the 
mutual  information  between  the  inputs  and  the  outputs  of 
a  NN.  Each  neuron  of  an  “outpuf  layer  should  be  able  to 
encode  features  that  are  as  statistically  independent  as 
possible  from  other  neurons  over  another  ensemble  of 
“inputs”.  The  statistical  independence  of  the  outputs 
implies  that  the  multivariate  probability  density  function 
(pdf)  of  the  outputs  can  be  factorised  as  a  product  of 
marginal  pdfs.  Bell  and  Sejnowski  [Bell  A.J.,  1995], 
derived  stochastic  gradient  learning  rules  for  achieving 
the  prescribed  maximization.  The  same  Authors  put  the 
problem  in  terms  of  an  information-theoretic  framework 
and  demonstrated  the  separation  and  deconvolution  of 
linearly  mixed  sources  [Bell  A.J.,  1996). 

Among  the  various  approaches  proposed  in  the 
literature  to  implement  the  ICA,  the  approach  used  by 
McKeown  [Lee  T.W.,  1999]  is  the  algorithm  developed 
by  Bell  and  Sejnowski  [Bell  A.J.,  1995]  which  is  based 
on  an  Infomax  NN,  where  a  self-organizing  algorithm  is 
used  to  maximize  the  information  transferred  in  a 
network  of  non-linear  units.  The  general  framework  of 
ICA  is  now  simply  described  as  the  blind  separation 
problem,  typically  introduced  by  the  “cocktail  party 
problem”;  we  have  n  different  sources  Sj  (that  is,  the 
speakers  i=l ,. .  .,n)  and  m  different  linear  mixtures  Xj  (that 
is,  the  microphones  j=l,...,m).  By  referring  to  j  as  the 
matrix  of  the  observed  signals,  and  as  s  the  matrix  of  the 
independent  components,  the  matrix  W,  called  unmixing 
matrix,  satisfies  the  following  property: 

or,  by  defining  the  mixing  matrix  A  as: 


x  =  A-s 


(2) 


then  the  mixing  and  unmixing 
following  equation: 


=  A 


matrixes  are  related  by  the 


(3) 


5.7  The  ICA  based  on  the  information  maximization 
by  using  a  neural  network  approach 

Bell  and  Sejnowski  derived  a  self-organizing 
learning  algorithm  to  maximize  the  information 
transferred  to  a  NN  of  non-linear  units.  The  non-linear 
transfer  functions  pick  up  the  higher-order  moments  of 
the  statistical  distribution  of  the  input  data,  and, 
moreover,  they  are  able  to  reduce  the  redundancy  in  the 
output  data.  Higher-order  methods  use  information  on 
the  distribution  of  x  that  is  not  contained  in  the 
covariance  matrix.  This  fact  becomes  meaningful  when 
the  distribution  of  x  is  non  Gaussian,  since  it  is  possible 
to  assume  that  the  covariance  matrix  of  a  zero  mean 
Gaussian  variable,  contains  the  whole  information 
carried  by  this  variable.  By  defining  the  differential 
entropy  for  a  continuous  random  variable  x  as; 

7f(x)  =  - 

when  f(x)  is  the  probability  density  function  of  the 
considered  variable.  The  conditional  differential  entropy 
is  defined  as  follows; 

n{y  I  ^) = -  I  ■  ’"[a 


It  represents  to  the  variations  that  occur  in  the 
information  carried  by  y  when  x  is  observed.  Finally  the 
mutual  information  between  two  variables  x  and  y  is 
given  by. 

M]{x,y)  =  H[x)-H[x\y)=H{y)-H{y\x)  (6) 

This  quantity  measures  the  information  that  is 
added  to  x  when  y  is  observed,  or  to  y  when  x  is 
observed.  The  mutual  information  of  (x,  y)  zeroes,  when 
and  only  when  the  variables  are  independent  The  Bell- 
Sejnowski  approach  is  based  on  the  use  of  a  NN  able  to 
minimize  the  mutual  information  between  the  input  x  and 
the  output  y  of  the  neural  network  where  y  are  the 
independent  components.  If  we  suppose  to  have  noise- 
free  input  data,  y  can  be  obtained  from  x  by  a 
deterministic  manner:  in  this  case,  H(y\x)  assumes  its 
lowest  value  (-oo).  The  problem  in  this  case  is  that  the 
density  functions  of  the  unknown  components  cannot  be 
computed,  and  therefore  the  H(y\x)  is  difficult  to  be 
estimated.  This  drawback  can  be  overcame  by  taking  into 
account  that,  if  y  can  be  computed  from  x  by  an 
invertible  continuous  deterministic  mapping,  the 
maximization  of  Ml(x\y)  corresponds  to  maximize  the 
entropy  of  the  outputs.  In  the  NN  case,  we  have  to 
maximize  the  H(y)  with  respect  to  the  networic 
parameters  tf.  If  we  have  just  one  input  x  and  one  output 
y,  if  the  mapping  from  x  to  y  is  defined  as  y=g(x),  and  if 
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g(*)  has  a  unique  inverse,  then  the  probability  density 
function  of  y  can  be  computed  as: 
lA,  1-1 

1^1  AW 


fy{y)  = 


dx 


The  differential  entropy  of y  is  given  by: 
//(v>-£[ln(/,))]=-£/>>ln|/,(p)]r:?p= 

-4ln(4W)] 


da 


To  maximize  the  differential  entropy,  we  need 
to  maximize  just  the  first  term.  This  maximization  is 
carried  out  by  a  stochastic  gradient  ascent  learning 
rule,  where  the  update  step  can  be  computed  as: 


A  - 

Aw  oc =  ^ 

dw 


5 

f 

In 

N 

>1 

dw 

1 

dx 

/ 

^dx, 

dwydXy 

(9) 


If  g(»)  becomes  the  logistic  transfer  function,  of 
the  scaled  and  translated  input 

1  (10) 

l  +  g-(w-x+wo) 

the  update  term  can  be  rewritten  as  the  update  step  for 
the  weight  w: 

1  r  .  ^ 

Aw  oc  —  +  X  ■  (1  -  2>'  j 
w 

and  the  update  step  for  the  bias  weight  can  be  computed 
as: 

Awq  oc  1  -  2>' 

In  the  most  general  multivariate  case,  we  have: 

where  I,  is  the  transformation  Jacobian.  The  update  step 
for  the  matrix  weight  becomes: 

^oc£"^+(l-2>')-/ 

where  i  is  a  unit  column  vector  and  the  update  step  for 
the  bias  weight  vector  can  be  computed  as: 

AW|i  oc  1  -  2  V  (1^) 

The  input  data  are  measurements  of  N  different 
input  sources,  and,  therefore,  they  can  be  referred  to  as  a 
matrix  x,  where  the  i-th  column  represents  the  i-th 
sample  of  the  each  source.  The  inputs  of  the  neural 
network  are  h=W  x »  and  &  are  called  sphered  data.  The 
sphered  data  are  computed  by  zero-meaning  the  input 
data  X  and  by  sphering  these  data  with  the  following 
matrix  operation: 

(16) 


X 


2o 


-I 


(17) 

(18) 


where  S  is  called  sphering  matrix,  and  it  is  used  to  speed 
the  convergence.  The  infomax  NN  estimate  the  matrix  y, 
where  the  i-th  column  represents  the  i-th  sample  of  the 


each  independent  component  The  architecture  of  the 
neural  network  is  depicted  in  Figure  1. 
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Figure  1-  Architecture  of  the  Infomax  Neural  Network 


5.2  ICA-NN  scheme  based  on  contrast  functions 

The  Infomax  NN  described  in  the  previous 
Section  has  some  limitations,  both  on  the  kind  of  source 
signals  pdf  and  in  the  computational  load.  In  this  Section 
we  will  describe  a  different  NN  scheme  to  extract  ICs 
that  is  most  suitable  to  solve  our  problem.  The  proposed 
NN  is  also  useful  to  cope  with  time-varying  mixtures 
[Koivunen  V.,  2001]. 

The  goal  of  ICA  is  to  make  a  transform  into  a 
signal  space  in  which  the  signals  are  statistically 
independent.  Sometimes  independence  can  be  attained, 
especially  in  blind  source  separation  in  which  the 
original  signals  are  linear  mixtures  of  independent  source 
components  and  the  goal  of  ICA  is  to  invert  the  unknown 
mixing  operation.  Even  when  independence  is  not 
possible,  the  ICA  transformation  produces  usefol 
component  signals  that  are  non-Gaussian.  The  ICA 
allows  us  to  approximately  take  into  account  all  higher- 
order  correlations  and  make  the  signals  traly 
independent.  Higher  order  statistics  are  needed  to 
determine  ICA  expansion.  In  the  framework  of  NNs,  the 
ICA  structure  is  that  of  a  linear  network  that  after 
learning  is  of  the  purely  feed-forward  type.  However, 
during  learning  non-linearity  must  be  used  for  separating 
sources.  We  assume  here  that  we  have  a  set  of  noisy 
linear  mixtures  representing  the  observed  signal.  By 

denoting  with  Xk  =  [Xk  (1) . Xk(M)]^  the  M- 

dimensional  kth  data  vector  corresponding  to  the 
measurements  carried  out  at  discrete  point,  we  can  write 
the  ICA  signal  model  in  the  vector  form: 


+  (19) 

Here  Sk  is  the  source  vector  consisting  of  the 
independent  signal  components  (sources),  Sk(i),  i=l ,  N,  A 
=  [a(l),  a(N)]  is  a  constant  MxN  “mixing  matrix” 

whose  columns  a(i)  are  the  basis  vectors  of  ICA,  and  iik 
denotes  possible  cormpting  noise,  often  omitted,  because 
it  is  not  possible  to  distinguish  noise  from  source  signals. 
The  source  separation  aim  is  to  determine  Sk,  knowing 
only  jCk-  Several  assumptions  must  be  made  in  ICA,  in 
particular,  only  one  of  the  source  signals  is  allowed  to 
have  a  Gaussian  marginal  distribution.  Typically,  the 
basis  vectors  a(i)  are  normalized  to  unit  lengA  and 
arranged  according  to  the  powers  E  [Sk  (i)  in  a  similar 
way  as  in  standard  PCA.  In  PCA,  the  data  model  has  the 
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same  form,  but  the  coefficient  Sk(i)  are  required  to  have 
sequentially  maximal  powers  (variances),  and  the  basis 
vectors  a(i)  are  constrained  to  be  mutually  orthonoimal. 
Usually,  the  basis  vectors  of  ICA  are  not  mutually 
orthogonal,  in  order  to  better  characterize  the  data.  The 
ICA  allows  to  determine  a  sparse  encoding  of  the  input 
vector,  where  histograms  show  a  high  probability  of  a 
large  response  as  well  as  of  no  response  at  all.  The  code 
increases  first-order  redundancy  (histograms)  by 
decreasing  higher-order  redundancy.  This  redundancy 
transformation  can  be  described  in  terms  of  kurtosis,  that 
is  defined  by  (E[.]  denotes  expectation): 


of  the  eigenvectors  of  x  and  D  is  the  diagonal  matrix  of 
eigenvalues  that  produces  a  starting  point  for  an  iterative 
process  that  finds  vector  W-  The  learning  rule  is; 

W  (k+1)  =  E  [y  g(S(k)  ^  y)  -  g  V)  M(k)l  (21) 

where  g(.)  is  the  hyperbolic  tangent.  After  finding  W,  the 
IC’s  can  be  found  by  linear  combination  y  =  ^  y  and 
the  mixing  matrix  A  by  A  =  E  D  W. 

The  use  of  ICA  network  allows  us  to  determine 
the  ICA  separating  matrix. 


k[s(i)1  =  E[s(i)^]-3[E[s(i)^]]l 


(20)  6.  Experimental  EMG  data  processing  results 


The  separation  capability  of  various  algorithms 
depends  on  the  kurtosis  [Ref,  Kar].  It  is  possible  to 
realize  the  estimation  procedure  by  using  a  feed-forward 
scheme.  The  inputs  of  the  NN  are  the  M  components  of 
the  vector  x.  In  the  hidden  layer,  we  have  N  nodes.  The 
first  layer  of  weights  carry  out  a  MxN  whitening  (and 
compression)  of  the  input  vector.  After  this,  the  sources 
are  separated  by  means  of  an  orthonormal  matrix 
=  I  k)  that  the  NN  should  leam.  The  ICA  network,  firstly 
proposed  in  [Karhunen  J.,  1997]  is  shown  in  Figure  2. 
Non-linearity  (i.e.,  hyperbolic  tangent  function)  must  be 
used  in  learning  the  separating  matrix.  The  learning 
algorithm  here  used  is  described  in  [Karhunen  J.,  1997] 
and  can  be  summarized  as  follows;  whitening  of  the 
original  data  x  by  y  =  D  E  x.  where  E  is  the  matrix 


The  ICA-NN  scheme  proposed  in  the  previous 
Section  has  been  used  to  extract  ICs  from  sEMG 
recordings.  In  what  follows,  we  will  report  some  results 
that  have  been  achieved  in  this  study.  The  following 
Table  reports  the  correspondence  between  the 
placements  of  sEMG  electrodes  and  the  related  muscles. 
Figure  3  reports  an  example  of  the  signal  acquired  during 
about  2  s  of  exercise  (corresponding  to  pointing  the 
monitor  of  a  computer  with  alternatively  the  right  and  the 
left  hand).  Figure  4  reports  the  time-course  of  the  6*  ICs, 
that  appears  to  be  mostly  correlated  with  the  4*  sEMG 
sensor. 


5  Hidden 
nodes 


y  =  ffi'^Yx 

Components  of  y  as  independent 
as  possible 

Weight  matrix  Q  minimizes  the 

MSE  error 

E{||x-Qy||M 

Orthonormal 

(W^  W  =  D  separation  matrix 
that  the  network  should  leam 


Wliilcning  Scpriniling  Esliireilcd  ICA 

matrix  matrix  basis  matrix 


Fig.  2-  The  Neural  Network  feed-forward  scheme  for  computing  ICA. 


1  SPec 

Superior  Pectoralis 

9  DBic 

Distal  Bicep 

2  IPec 

Inferior  Pectoralis 

10  PTri 

Proximal  Tricep 

3  LPec 

Lateral  Pectoralis 

1 1  DTri 

Distal  Tricep 

4  LDel 

Lateral  Deltoid 

12  PWEx 

Proximal  Wrist 

Extensors 

5  ADel 

Anterior  Deltoid 

13  DWEx 

Distal  Wrist  Extensors 

6  MTrp 

Medial  Trapezius 

14  PWFl 

Proximal  Wrist  Flexors 

7  LTrp 

Lateral  Trapezius 

15  DWFl 

Distal  Wrist  Flexors 

8  PBic 

Proximal  Bicep 

16  APB 

Abductor  Pollicus  Brevis 

Table  1 ;  Correspondence  between  the  electrode  locations 
and  the  investigated  muscles 
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Each  ICs  consists  of  a  temporally  independent 
waveform  and  a  spatial  distribution  over  the  electrodes. 
The  spatial  distributions  of  the  electrodes  is  shown  on  a 
cartoon  body.  The  diagram  has  been  obtained  by  making 
use  of  the  MATLAB  Toolbox  for  Electrophysiological 
Data  Analysis,  Version  3.2  (S.  Makeig,  et  al,  available 

online,  httD://www.cnl. salk.edu/scott/ica.html'). 

The  electrodes  are  positioned  according  to  Table 
1.  The  colouring  of  each  electrodes  is  proportional  to  the 
particular  IC  contributes  to  the  electrode’s  raw  recording. 
In  the  example,  it  is  shown  that  the  6*  ICs  mostly 
contributes  to  the  4*  electrode  reading.  Note  the 
unmixing  of  the  related  components,  basically  activating 
just  one  electrode.  Figures  6  to  8  reports  the  same  signals 
for  the  le*  electrode  and  the  16“’  ICs.  In  this  case,  the 
16“'  component  mainly  activates  the  same  electrode. 

Measuring  the  ICs  of  sEMG  will  provide  a  more 
reliable  and  robust  measure  of  motor  performance  than 


interpreting  the  activity  of  each  individual  muscles  in 
isolation  [Jung  T.P. ,  200 1  ] . 

There  are  practical  advantages  of  separating  the 
sEMG  signals  into  temporally  ICs,  namely,  the  ICs  are 
less  susceptible  to  changes  in  position  of  the  electrodes, 
and  therefore  more  suitable  for  serially  monitoring 
performance;  the  ICs  are,  in  addition,  more  likely  to 
correspond  to  brain  activations  [Jung  T.P.,  2001],  by 
looking  for  common  cortical  influences  in  the  muscle 
activity. 

As  previously  mentioned,  the  experiment 
described  in  the  present  Section  have  been  carried  out  by 
using  a  Neural  Network  scheme  to  implement  ICA.  It  is, 
of  course,  possible,  to  use  different  techniques  to 
implement  ICA,  however,  it  could  be  demonstrated  that 
the  use  of  a  NN  approach  is  equivalent  to  other 
approaches,  like  maximum  likelihood  estimation.  The 
NN  scheme  is  most  suitable  to  achieve  hardware 
implementation. 


'  ^  Ch#hn«4 


Figure  3:  Raw  EMG  recording  from  the  4th  electrode 
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Figure  7:  Time-course  of  the  16lh  extracted  ICs 
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Figure  8;  Spatial  distribution  of  the  activations  corresponding  to  the  lb""  ICs 


7.  Treatment  of  non-stationarity 

The  extraction  of  ICs  is  based  on  the  assumption 
of  stationarity  among  different  trials  of  the  same 
experiment.  In  the  practice,  for  such  sEMG  data,  this  is  a 
hardly  acceptable  assumption.  We  would  like  now  to 
propose  a  time-frequency  approach  to  the  analysis  of 
sEMG  data  (or  their  ICs  counteiparts)  that  allows  to  cope 
with  signal  non-stationarity.  The  sEMG  is  indeed  non¬ 
stationary  as  its  statistical  properties  change  over  time. 
The  MUAPs  (Motor  Unit  Action  Potentials)  are 
transients  that  exist  for  a  short  period  of  time;  for  that 
reason,  time-frequency  methods  are  useful  to 
characterize  the  localized  frequency  content  of  each 
MUAP.  The  use  of  a  time-frequency  representation  also 
allows,  in  principle,  to  detect  the  onset  of  sub¬ 
movements,  according  to  what  we  explained  in  the 
previous  Sections.  We  have  carried  out  the  wavelet 
analysis  in  both  the  time  domain  of  sEMG  and  of  the 
ICs,  in  order  to  show  that  this  kind  of  analysis  should  be 
carried  out  on  the  original  space  (the  IC  space  is 
generated  by  already  making  a  stationarity  assumption). 

The  wavelet  transform  also  guarantees  to 
possibility  of  not  specifying  in  advance  the  key  signal 
features  and  the  optimal  basis  functions  needed  to  project 
the  signal  in  order  to  highlight  the  features.  An 
orthogonal  wavelet  transform  is  characterized  by  two 
functions: 

1)  the  scaling  function, 

<t>[x)^42'Z,^,Kk),f>{2x-k)  (22) 

and  2)  its  associated  wavelet: 

¥{x)=^'Z,^,g{k),l>(2x-k)  (23) 

where  g(k)  is  a  suitable  weighting  sequence  (function). 


The  sequence  h  (k)  is  the  so-called  refinement 
filter.  The  wavelet  basis  functions  are  constructed  by 
dyadic  dilation  (index  j)  and  translation  (index  k)  of  the 
mother  wavelet; 


y/j,  =  2  >'^yf(xlT>  -  k) 


(24) 


The  sequences  h  and  g  can  be  selected  such  that 
l(y*)€Z“  constitutes  an  orthonormal  basis  of  L2,  the 
space  of  finite  energy  functions.  This  orthogonality 
permits  the  wavelet  coefficients  d j{k)  =  (^f  ,¥ jk) 

the  approximation  coefficients  C j  {k)  =  ,  ^jk  ^  of  any 


function  f(x)  to  be  obtained  by  inner  product  with  the 
corresponding  basis  functions.  In  practice,  the 
decomposition  is  only  carried  out  over  a  finite  number  of 
scales  J.  The  wavelet  transform  with  a  depth  J  is  then 
given  by; 

^  (25) 

fix)  =  %Yj^jik)¥jk  +  Y.^jik)^jk  ■ 

y=l  iteZ  AeZ 


In  the  present  study,  we  shall  use  the  WT  in 
order  to  derive  a  set  of  features  that  can  reveal  singularity 
of  the  signal  (corresponding  to  the  onset  of  activity  of 
single  muscles)  and  to  detect  the  precursors  of  the  non- 
stationarity.  A  set  of  features  derived  from  the  inspeetion 
of  the  scale-dilation  plane  have  been  used  as  input  vector 
of  an  auto-associative  NN  that  is  able  to  alarm  the  user 
about  modification  of  the  energy  content  of  the  spectrum. 
The  features  are  extracted  by  considering  the 
correspondence  between  singularities  of  a  function  and 
local  maxima  of  its  wavelet  transform.  A  singularity 
corresponds  to  pairs  of  modulus  maxima  across  several 
scales.  Feature  extraction  is  accomplished  by  the 
computation  of  the  singularity  degree  (peakiness),  i.e.. 
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the  local  Lipschitz  regularity,  which  is  estimated  from 
the  wavelet  coefficients  decay  [Mallat  S.,  1992,  Arkidis 
N.S.,  2002], 

Figures  9  and  10  reports  the  amplitude  sEMG 
signal  for  channel  4*,  and  the  wavelet  transform  obtained 
by  using  Daubechies  1  and  4  mother  wavelet.  The 
modulus  maxima  plots  have  been  drawn  and  a 
thresholding  operator  is  used  in  order  to  reduce  the 
number  of  effective  wavelet  coefficients  needed  to 
represent  the  original  functions.  Once  the  features  have 
been  extracted  by  inspecting  the  modulus  maxima  plot, 
we  can  use  the  corresponding  nonzero  coefficients  in 
order  to  predict  the  raising  of  non  stationarity.  A  MLP 
NN  with  an  input  layer  of  corresponding  size  acts  as  a 
bottleneck  network  (the  output  size  is  the  same  of  the 
input  one,  while  the  hidden  layer  size  is  considerably 
reduced).  The  NN  fed  by  the  wavelet  coefficients 
computes  the  estimation  of  the  corresponding  wavelet 
coefficients  at  the  output:  a  reconstruction  error  is 
computed.  If  the  error  overcomes  a  prescribed  threshold 
level,  the  non-stationarity  signal  is  activated  and  the 
following  trials  are  used  to  compute  a  novel  matrix  (ICs) 
weights.  The  use  of  a  MLP-NN  is  not  obliged  to  ensure 
accuracy  or  success  in  the  reconstruction;  for  example,  a 


different  compression  scheme  could  be  used,  like  the 
Singular  Value  Decomposition.  The  bottleneck  layer  is, 
in  principle,  able  to  work  as  principal  component 
extractor,  but  the  idea  here  is  to  build  a  compressed 
representation  which  is  deliberately  redundant  The 
reconstruction  error  could  be  sub-optimal  with  respect  to 
different  schemes,  but  optimality  comes  at  the  expenses 
of  quite  low  fault  tolerance.  Finally,  the  MLP  NN  can  be 
implemented  easily  in  a  FPGA  hardware  chip.  A  typical 
case  of  non-stationarity  is  the  onset  of  fatigue.  The 
Figure  1 1  describes  how  the  activation  intervals  [Micera 
S.,  2001]  of  the  muscles  during  the  exercise  cycle  are 
determined  starting  from  the  ICs. 

The  standard  approach  to  determine  on-off 
activation  patterns  is  to  process  each  epoch  by  means  of 
a  double  threshold  statistical  detector  [Bonato  P.,  1998, 
Balestra  G.,  2001]  to  obtain  the  muscle  detection 
intervals.  We  have  compared  the  results  achieved  by  our 
method  with  the  one  described  and  we  have  found  an 
improvement  of  about  20%  in  the  performance. 


Figure  9:  The  wavelet  transform  of  the  4th  sEMG  channel  (mother  wavelet,  Daubechies  1):  the  raw 
data  recording  (top),  the  plot  of  the  absolute  values  of  the  WT  coefficients  (middle)  and  the  modulus 
maxima  extracted  (bottom).  A  thresholding  is  applied  to  suppress  WM  that  are  not  of  interest.  White 
colour  corresponds  to  high  value  of  the  coefficients.  If  one  uses  a  wavelet  with  one  vanishing  moment, 
then  the  bottom  plot  corresponds  to  the  maxima  of  the  smoothed  first-order  derivative  of  the  function. 
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Figure  10:  The  wavelet  transform  of  the  4th  sEMG  channel  (mother  wavelet,  Daubechies  4):  the  raw 
data  recording  (top),  the  plot  of  the  absolute  values  of  the  WT  coefficients  (middle)  and  the  modulus 
maxima  extracted  (bottom).  White  colour  corresponds  to  high  value  of  the  coefficients.  A  wavelet 
function  with  4  vanishing  moments  is  used. 


Figure  1 1 :  The  determination  of  the  activation  intervals  (the  wavelet  envelope  is  used). 
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8.  Conclusion 

The  paper  proposed  the  use  of  some  NNs  to 
process  experimental  electrical  data  derived  from  non- 
invasive  sEMG  experiments.  The  original  (raw)  data 
have  been  analysed  by  a  neural  IC  processor  aiming  to 
obtain  signals  that  can  be  easily  correlated  to  cortical 
activity.  The  assumption  of  stationarity  is  then  relaxed  in 
order  to  cope  with  time-vaiying  mixing  systems,  more 
adherent  to  the  biophysical  problem  at  hand.  An  auto- 
associative  NN  exploits  the  features  obtained  by  wavelet 
transforming  the  raw  data  for  making  a  quick  and 
efficient  prediction  of  non-stationarity.  The  results  we 
have  shown  can  be  considered  just  as  preliminary  to 
solve  the  difficult  problem. 
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Abstract  :  In  this  paper,  we  present  a  new  approach  for 
modeling  the  high-frequency  effects  of  embedded  passives 
in  multilayer  printed  circuits,  utilizing  state  space  equations 
or  equivalent  circuit  together  with  neural  network 
techniques.  In  this  approach,  the  neural  network  based 
model  structure  is  trained  using  full  wave  electromagnetic 
(EM)  data.  The  resulting  embedded  passive  models  are 
accurate  and  fast,  can  be  used  in  both  frequency/time 
domain  simulators.  Examples  of  embedded  resistor  and 
capacitor  models  demonstrate  that  the  combined  model  can 
accurately  represent  EM  behavior  in  microwave/RF  circuit 
design.  In  high-level  circuit  design,  we  applied  our 
combined  EM  based  neural  models  for  signal  integrity 
analysis  and  design  of  multilayer  circuit  to  illustrate  that  the 
geometrical  parameters  can  be  continuously  adjusted  by 
using  neural  network  techniques.  Optimization  and  Monte- 
Carlo  analysis  are  performed  showing  that  the  combined 
models  can  be  efficiently  used  in  place  of  computationally 
intensive  EM  models  of  embedded  passives  to  speed  up 
circuit  design. 

I,  Introduction 

The  drive  in  the  electronics  industry  for  manufacturibility- 
driven  design  and  time-to-market  demands  powerful  and 
efficient  computer-aided  design  (CAD)  techniques.  As  the 
signal  frequency  increase,  the  dimensions  of  embedded 
passives  in  multilayer  circuits  become  a  significant  fraction 
of  signal  wavelength.  The  conventional  time/frequency 
domain  electrical  models  for  the  components  are  not 
accurate  anymore.  As  EM  effects  play  an  important  role  in 
microwave/RF  circuit  design,  models  with  continuous 
physical/geometrical  information  must  include  EM  effects 
[1],  Furthermore,  the  need  of  optimization  and  statistical 
analysis  taking  into  account  process  variations  and 
manufacturing  tolerances  in  the  components  makes  it 
extremely  important  that  the  component  models  should  be 
accurate  and  fast  so  that  the  design  solutions  can  be 
achieved  feasibly  and  reliably. 

Recently,  artificial  neural  network  (ANN)  modeling 
approach  has  been  studied  for  microwave  modeling  and 
design  [2-4].  The  neural  models  can  be  as  fast  as  empirical 
models  and  as  accurate  as  detailed  physics  models. 

For  high-level  circuit  design,  the  component  models  should 
be  continuously  varied  both  with  frequency,  geometrical 
and/or  electrical  parameters.  Therefore,  modeling 
techniques  that  can  provide  such  continuous  variations  are 
essential  and  ANN  models  exactly  meet  for  these 
requirements.  They  are  continuous,  multi-dimensional  and 


can  easily  handle  nonlinearities  in  problem  behaviours. 
Neural  network  techniques  have  been  widely  used  to  model 
variety  of  microwave  device/circuits  such  as  transmission 
line  components  [5],  bends  [6],  vias  [7],  spiral  inductors  [8], 
and  FET  devices  [5,9]. 

Embedded  passives  represent  an  emerging  technology  area 
that  has  the  potential  for  increased  reliability,  improved 
electrical  performance,  size  shrinkage,  and  reduced  cost. 
The  conventional  approach  for  circuit  and  system  design 
requires  equivalent  circuits  to  capture  the  response  of 
embedded  passives  [10].  But  the  equivalent  circuit  method 
may  not  be  accurate  enough  to  reflect  high  frequency  EM 
effects.  Recently,  neural  network  techniques  have  been 
introduced  to  model  frequency  behavior  of  embedded 
passives  [1].  However,  such  ANN  models,  trained  to  learn 
S-parameters  data,  cannot  be  used  directly  into  time-domain 
circuit  simulation  and  optimization.  Our  target  was  to 
develop  passive  ANN  based  models  from  EM  data  that  can 
be  used  directly  in  both  time  and  frequency  domain  circuit 
design. 

In  this  paper,  we  present  a  novel  approach  to  model  high- 
frequency  effects  of  embedded  passives  in  multilayer 
printed  circuits  based  on  combined  equivalent  circuit  or 
state  space  theory  together  with  neural  networks.  Our 
combined  model  is  a  hierarchical  structure  with  two  levels. 
In  the  lower  level,  a  neural  network  maps  the 
geometrical/physical  parameters  of  the  passive  component 
into  coefficient  matrices  of  state  equations  or  lumped 
component  values  of  a  given  equivalent  circuit.  In  the 
higher  level,  we  export  the  coefficient  matrices  into  the  state 
space  equation  or  component  values  into  the  equivalent 
circuit  to  compute  the  EM  response  in  either  frequency  or 
time  domain  circuit  design.  The  accurate  and  fast  ANN 
based  embedded  passive  models  are  trained  from  full  wave 
EM  data.  Our  method  combines  existing  modeling 
techniques  and  recent  neural  network  approaches  to 
efficiently  perform  simulation  and  optimization.  Based  on 
neural  network  techniques,  geometrical/physical 
parameters  become  design  variables  to  improve  circuit 
performance  and  reduce  design/manufacture  cost. 

In  Section  II,  the  problem  for  neural  modeling  of  embedded 
passives  is  summarized.  In  Section  III,  we  present  the 
combined  equivalent  circuit  and  neural  network  (EC-NN) 
modeling  approach.  The  combined  State  space  equation  and 
neural  network  (SSE-NN)  modeling  approach  is  presented 
in  section  IV.  The  method  is  demonstrated  by  embedded 
resistor  and  capacitor  examples  in  section  V.  Signal 
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integrity  of  multilayer  circuit,  which  includes  SSE-NN 
models  of  embedded  passives,  is  used  to  demonstrate  the 
application  of  the  model  for  circuit  simulation.  Optimization 
and  Monte-Carlo  analysis  are  performed  showing  that  the 
geometry  inputs  can  be  continuously  adjustable  by  using 
our  combined  models  and  the  model  evaluation  is  much 
faster  than  computationally  intensive  physical/EM  model  of 
passives  in  microwave  design. 

II.  Embedded  Passives  Neural  Modeling:  Problem 
Statement 

Let  X  represent  a  M^-vector  containing  parameters  of  a 
microwave  device/circuit,  e.g.,  length  and  width  of  an 
embedded  resistor,  or  thickness  and  dielectric  constant  of  an 
embedded  capacitor.  Let  y  represent  a  -vector 

containing  the  responses  of  the  component  under 
consideration,  e.g.,  Y-  or  S-parameters.  The  physics/EM 
relationship  between  ^  and  x  can  be  highly  nonlinear  and 
multi-dimensional.  The  theoretical  model  for  this 
relationship  may  not  be  available,  or  theory  may  be  too 
complicated  to  implement,  or  the  theoretical  model  may  be 
computationally  too  intensive  for  online  microwave  design 
and  repetitive  optimization  (e.g.,  3D  full-wave  EM  analysis 
inside  a  Monte-Carlo  statistical  design  loop).  We  aim  to 
develop  a  fast  and  accurate  neural  model  by 
teaching/training  a  neural  network  to  learn  the  embedded 
passive  problem.  Let  the  neural  network  model  be  defined 
as 

y=fANN{x,w)  (1) 

where  w  represents  the  parameters  inside  the  neural  network 
also  called  as  the  weight  vector.  The  most  widely  used 
neural  network  structure  is  the  feedforward  multilayer 
perceptrons  (MLP)  [2,  5,  7]  where  neurons  are  grouped  into 
layers,  and  each  neuron  in  a  layer  acts  as  a  smooth  switch 
that  produces  a  response  between  low  and  high  state 
according  the  weighted  responses  of  all  neurons  from  the 
preceding  layer.  The  neural  network  structure  allows  the 
ability  to  represent  multidimensional  nonlinear  input/output 
mappings  accurately,  and  to  evaluate  y  fromx  quickly.  To 
enable  a  neural  network  to  represent  a  specific  microwave  x 
-y  relationship,  we  first  train  the  neural  network  to  learn 
the  microwave  data  pairs  (jc,,  rf,)  where  at,  is  a  sample  of  x,d , 
is  a  vector  representing  the  y  data  generated  from 
microwave  simulation  or  measurement  under  given  sample 
Xj,  and  i  is  the  sample  index.  For  training  purpose,  we  define 
an  error  function  E(h')  as 

^  lefr  k=\ 

where  rf*,  is  the  element  of  </,,  /ann  t  i^i  ■’*’)  ^ 

output  of  the  neural  network  for  input  sample  x,  and  Tr  is  an 
index  set  of  all  training  samples.  The  objective  of  neural 
network  training  is  to  adjust  neural  network  connection 


weights  w  such  that  ECw)  is  minimized.  A  trained  neural 
model  can  then  be  used  online  during  microwave  design 
stage  providing  fast  model  evaluation  replacing  original 
slow  model  from  EM  simulators.  The  benefit  of  the  neural 
model  is  especially  significant  when  the  model  is 
repetitively  used  in  design  processed  such  as  optimization, 
Monte-Carlo  analysis,  and  yield  optimization  [11]. 
However,  MLP  models,  trained  to  learn  S-parameters  data, 
cannot  be  used  directly  into  time-domain  circuit  simulation 
and  optimization.  We  aim  to  develop  a  fast  and  accurate 
combined  model,  which  uses  equivalent  circuit  and  neural 
network,  through  EM  data  to  learn  the  embedded  passive 
problem. 

Let  gp  =  {R,  L,  C}  be  a  Ap-vector  containing  the  values  of 
lumped  components  of  a  given  equivalent  circuit  topology  Tp. 
We  use  a  neural  network  to  represent  gp  as 

gp=fANN^X,w)  (3) 

and  then  the  combined  model  can  be  defined  as 

y{(o)=ff^p{fAm{x,w)\co)  (4) 

y{t)=fXTp{fANNM\f)  (5) 

where  O)  is  the  angular  frequency,  y{co)  and  are  the 

combined  model  response  in  frequency  and  time  domain 
respectively,  e.g.,  y(co)  can  be  S-  or  Y-parameters  and  y(J) 
can  be  the  currents  /(/)  and  voltages  v(0  of  a  two  port 
embedded  passive.  Therefore,  a  eombined  model  realizes 
the  X  -y/y  relationship  through  a  MLP  and  then  equivalent 

circuit. 

III.  Combined  Equivalent  Circuit  and  Neural 
Network  (EC-NN)  Modeling  Approach 

A.  Introduction  of  EC-NN  Model 

A  number  of  fast  equivalent  circuit  models  of  embedded 
passive  components  are  available.  In  [12,  13],  two  methods 
are  presented  for  developing  equivalent  circuit  using 
optimization  methods.  Synthesize  lumped  element 
equivalent  circuit  from  rational  function  is  presented  in 
[10].  Although  we  can  get  equivalent  circuit  in  many  ways 
from  measured  or  simulated  EM  data,  an  equivalent  circuit 
only  represents  a  fixed  embedded  passive  structure.  If  the 
embedded  passive’s  geometrical/physical  parameters  need 
to  be  changed,  we  have  to  re-generate  a  new  equivalent 
circuit  to  match  it. 

In  this  paper,  EC-NN  model  exploits  neural  network 
features  to  accurately  predict  element  values  of  equivalent 
circuit  based  on  geometrical/physical  parameters.  EC-NN 
model,  motivated  by  [14],  is  a  hierarchical  model  with  two 
levels.  At  the  lower  level,  a  neural  model  maps  the 
geometrical/physical  parameters  of  the  passive  component 
into  lumped  component  values  of  a  given  equivalent  circuit. 
At  the  higher  level,  we  supply  these  values  into  the 


DING,  et  al.:  STATE  SPACE  FORMULATION  FOR  MODELING  OF  PASSIVES  IN  PRINTED  CIRCUITS 


91 


equivalent  circuit  to  compute  the  EM  response  in  frequency  IV.  Combined  State  Space  Equation  and  Neural 
or  time  domain  circuit  design.  Network  (SSE-NN)  Modeling  Approach 


B.  EC-NN  Model  development 

We  utilize  an  existing  equivalent  circuit  and  combine  it  with 
a  MLP  together  to  make  the  model  automatically  as 
function  of  geometrical/physical  parameters.  The  EM  data 
of  embedded  passives,  which  consists  of 
geometrical/physical  parameters  as  inputs  and 
real/imaginary  parts  of  S-parameters  as  outputs,  are 
generated  by  simulation  or  measurement. 


A.  Formulation  in  Frequency-Domain 

Topology  of  equivalent  circuit  is  a  sensitive  factor  of  the 
combined  model  accuracy  and  a  given  topology  may  not  be 
suitable  for  different  geometry  and  frequency  range.  In 
order  to  develop  an  accurate  model,  which  can  be 
represented  more  efficiently  in  both  time  and  frequency 
domain  simulation,  we  proposed  the  combined  SSE-NN 
modeling  approach. 


To  create  data  for  neural  network  training,  we  extract  the 
lumped  component  values  based  on  the  existing  equivalent 
circuit  through  a  set  of  measured/simulated  sample  pairs  of 
EM  data.  Considering  some  measurement  noise  in  the  EM 
data,  the  parameter  extraction  criterion  for  each  set  of  input 
geometry  is  defined  as  an  optimization  objective  fimction  as 

leT,  k=l 

This  objective  function  shows  that  adjusting  the  lumped 
component  values  gp  to  map  the  S-parameters  of  high- 
frequency  response  of  the  equivalent  circuit  best  match  the 
EM  data  in  the  interested  frequency  bandwidth.  Due  to  the 
complexity  of  the  error  fimction,  iterative  algorithms  are 
used  to  explore  the  lumped  component  values.  The 
optimization  algorithms  we  used  are  gradient  and  quasi- 
Newton  methods.  We  collected  the  lumped  component 
values  versus  geometrical/physical  parameters  as  neural 
network  training  data.  We  teach/train  a  MLP  to  learn  the 
relationships  between  equivalent  circuit  component  values 
and  geometry.  Let  g,,  be  a  vector  representing  gp  data  under 
given  san^le  a;/.  The  error  function  is  defined  as 

isTr  k=\ 

where  gpu  is  the  A:**  element  of  gpi.  After  training,  the  MLP 
can  accuratly  calculate  the  component  values  varied  with 
continous  geometry  for  the  given  equivalent  circuit.  The  last 
step  is  to  export  the  EC-NN  model  into  a  user  defined 
simulation  format,  e.g.,  SPICE  sub-circuit  netlist  format. 
The  EC-NN  model  includes  two  sections.  The  first  section 
is  a  set  of  mathematical  equations  to  represent  the  internal 
structure  of  neural  network  that  calculate  the  lumped 
component  values  based  on  different  geometry/physical 
inputs.  The  second  section  is  the  updated  equivalent  circuit, 
which  receives  the  element  values  from  MLP  outputs.  In  a 
circuit  simulator,  the  EC-NN  model  will  be  feed  by 
geometrical/physical  parameters  as  inputs.  The  MLP 
automatically  calculates  the  element  values  in  a  user  defined 
equivalent  circuit  and  supply  the  values  into  the  equivalent 
circuit  to  represent  EM  behavior  in  frequency  and  time 
domain. 


EM  data  of  an  embedded  passive  can  be  collected 
depending  on  different  geometrical/physical  parameters 
from  full  wave  EM  simulation/measurement.  For  a  given 
frequency  range,  we  can  use  transfer  functions  (polynomial 
rational  functions)  to  represent  the  electrical  behavior  (e.g., 
admittance  Y  matrix)  of  the  embedded  passives.  For  any 
two-port  embedded  passives,  the  following  three  transfer 
functions  are  adequate  to  represent  Yn,  Y21,  and  Y22, 
respectively. 


Hy(.s)  = 


Zjq  +  b^s  +  •  •  •  -F  b„_is"  *  +  bjjS" 

Oq  +ais  +  ---  +  a„_is’‘~'  +s" 


(8.a) 


H2(s) 


df)  +  +  •■  •  +  dp_^s''  *  +  dpS'' 

Og  +fliJ  +  "'-Fa„_is"“’  +s" 


(8.b) 


+  (8.e) 

ao+a^s  +  --  +  a„.iS  +s 

where  s  =  jot,  and  n  is  the  number  of  effective  order  of  the 
passive.  Let  us  define  a  real  coefficient  vector,  as  g,  =  {oq. 
^1,  ...  ^n~l*  ^0)  •••  ^0)  •*•  ^n} '  Dsing 

Space-mapping  concept  [6],  a  relationship  exists  between 
the  coefficients  and  geometricaFphysical  parameters. 
However,  the  relationship  would  be  highly  nonlinear  and 
too  complicated.  Therefore,  we  utilize  neural  network 
features  to  learn  the  highly  nonlinear  relationship  between 
the  coefficients  and  geometricaFphysical  parameters. 


In  the  coefficient  parameter  extraction  procedure,  we  used 
gradient  and  quasi-Newton  optimization  algorithms  to 
enforce  II(s)  to  best  match  EM  data.  The  objective  function 
was  defined  as 

MiTiy.^\Hk(.gy>(o)-d^\  (9) 

ieTrk=\ 

and  we  use  a  neural  network  to  learn  the  relationship 
between  coefficient  vector  and  EM  input  parameters  x, 

Sv  -  fANNi.^ 

We  used  the  center  point  of  input  space  as  the  initial  point 
to  optimize  the  coefficient  vector  values. 
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B.  State  Space  Equation  for  Time-Domain  Simulation 
Using  coefficients  in  (8),  we  can  define 


■  0  1  0  0  0 

0  0  1  •••  0  0 

:  :  :  0  : 

—  Oq  —  —02  0 

0  0  0  0  0 
0  0  •  0  0  0 


0  •••  0  0 

0  0  0 

0  •••  0  0 

1  0  0 

0  1  0 


B  = 


0  0 
0  0 


...  0 

1  0  0 

0  0  0 


-«0 


D- 


-Ol 

'b„ 


(11) 


da  —  ■  ■  ■  d„_]  ~Ci„-\d„  Cg—  OgC^ 


d„-i-a„-,d„ 


J2>an 


to  form  the  state  space  equation, 
fx(t)  =  Ax(t)  +  Bu(t) 

\yit)  =  Cx(t)  +  Duit) 

where  x(t)  is  a  vector  of  internal  states,  «  and  y  are  vectors 
of  the  input  and  output  signals,  e.g,,  input  voltages  and 
output  currents  of  the  embedded  passive  respectively.  Our 
combined  model  can  be  then  implemented  into  a  time 
domain  circuit  simulator  using  the  state  space  equation  (12) 
or  into  a  frequency  domain  circuit  simulator  using  (8). 

C.  Stability  and  passivity 

To  assure  stability  requirement  in  time  domain  simulation, 
the  poles  of  the  combined  SSE-NN  model  need  to  be  on  left 
half  plane  (LHP)  [15],  To  enforce  all  the  poles  of  the 
transfer  functions  of  embedded  passives  to  be  into  LHP,  we 
added  a  set  of  constraints  in  the  parameter  extraction  as 


Peye^^rder=\\P2i  \  whcrc  Pi,=(5'*+^2fS+^j,)  and  T  =  nil,  if 
1=1 

k2i>0  &  k3i>0;  all  of  real  and  complex  roots  in  LHP. 

T 

Podd-order=  Pi 'YIPh  i  where  P,=(_s  +k,)  and  T  =  {n-\)l2,  if 
1=1 

ki>0  ,  k2i>0  cS:  k3,>0\  all  of  real  and  complex  roots  in  LHP. 


where  k  ={ki,  k2i,  k3i,  ...  k2T,  ^jr}  is  a  vector  of 
components  that  lead  to  elements  in  the  matrix  A.  For 
example,  in  a  3^^  order  combined  model,  the  denominator 
coefficients  are  defined  as  ag=k^  k3  \  a,  =  A:,  •  A:2  +  ;  and 

Oj^k^+k^,  respectively. 


The  criterion  for  passivity  can  be  defined  if  the  eigenvalues 
of  G  =  Re{Y}  are  positive  [15,  16].  This  condition  can  be 
assured  if  >’i2>'2i  -  >’n>'22>  where  the  y,*  (/>^  ~  1>2)  are  real 
parts  of  the  Y  matrix  elements.  It  has  been,  used  as  an 


optimization  constraint  in  the  gy  parameter  extraction 
procedure. 

The  above  criterions  are  added  in  the  parameter  extraction 
to  ensure  that  the  rational  functions  not  only  accurately 
represent  EM  behavior  but  also  enforce  the  time  domain 
model  to  be  stable  and  passive. 

D.  Structure  of  the  Combined  SSE-NN  Model 

Our  combined  SSE-NN  model  is  a  hierarchical  structure 
with  two  levels.  At  the  lower  level,  a  neural  network  maps 
the  geometrical/physical  parameters  into  g,  vectors.  At  the 
higher  level,  we  insert  the  coefficient  vectors  into  the  state 
equations  to  compute  the  EM  response  in  frequency  or  time 
domain  simulation.  Fig.  1  shows  the  structure  of  the 
combined  model  for  both  EC-NN  and  SSE-NN . 

For  eircuit  CAD  tools  in  time  domain,  we  export  our  SSE- 
NN  into  SPICE  sub-circuit  format.  The  lower  neural 
network  will  be  described  by  a  set  of  mathematical 
equations,  which  calculate  the  coefficient  values  based  on 
different  geometrical/physical  parameters  and  pass  them 
into  higher  level.  The  equivalent  circuit  can  be  generated 
from  (1 1)  and  (12). 


Test 


Y-/S-parametersl 


Y-/S-parameters  from  combined  model 


EM 

Data 


E.C  or  S.S.E 


Neural 

Network 


Parameter 
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|.4 - Vector 

gpOTgy 


Refined 

Training 


k  : 

Neural 

Network 
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1  ^ 

Geometrical/Physical  Input  Parameters  x, 


Figure  1.  Structure  of  the  combined  EC-NN  and  SSE-NN 
models  illustrating  the  model  development  process  and  the 
testing  phase.  E.C.  and  S.S.E.  represent  equivalent  circuit 
and  state  space  equation  respectively. 


E.  Combined  SSE-NN  Model  Development 

EM  data  has  component’s  geometrical/physical  parameters 
and  frequency  as  inputs  and  S-parameters  as  outputs.  The 
next  phase  is  parameter  extraction,  which  is  carried  out  for 
each  geometry  over  the  entire  frequency  range.  The 
objective  here  is  to  determine  the  coefficient  values  that 
best  fit  the  original  EM  data.  Different  geometrical 
parameter  values  and  their  corresponding  coefficient  values 
are  then  re-arranged  into  neural  network  training  data.  A  3- 
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layer  MLP  neural  network  is  trained  using  quasi-Newton 
algorithm  in  NeuroModeler  [17].  For  any  given  geometrical 
dimensions  of  the  component  within  the  range  of  the 
training  data,  the  trained  MLP  can  predict  the  elements  of 
vector  g,.  We  combine  the  state  equation  with  the  neural 
model  using  our  hierarchical  setup  to  obtain  the  overall 
combined  model.  The  inputs  to  the  combined  model  are  the 
geometrical  dimensions  of  the  embedded  component.  The 
intermediate  outputs  of  the  model  are  the  corresponding 
coefficient  vector  values.  The  final  outputs  of  the  combined 
model  are  component’s  EM  behavior,  e.g.,  S-parameters.  In 
the  test  phase,  an  independent  set  of  test  data  containing  S- 
parameters  versus  new  geometrical  parameter  values  (i.e., 
never  seen  during  training)  is  generated  using  the  EM 
simulator.  This  data  is  used  to  test  the  accuracy  of  the 
combined  model.  In  the  final  phase,  we  formulate  the 
combined  model  into  a  set  of  mathematical  expressions  to 
be  directly  used  to  carry  out  high-level  circuit  design  in 
time-domain  simulators. 

V.  Examples 

In  order  to  demonstrate  the  proposed  modeling  approach, 
we  developed  embedded  resistors  and  capacitors  in  EC-NN 
and  SSE-NN  models.  We  applied  the  SSE-NN  models  in 
signal  integrity  of  multilayer  circuit  design  to  efficiently 
perform  optimization  and  statistic  analysis. 

A.  Embedded  Resistor 

Accurate  modeling  of  EM  behaviors  of  embedded  passive 
used  in  high-speed  multilayer  printed  circuit  board  is 
important  for  efficient  high-speed  circuit  design.  In  this 
example,  a  combined  EC-NN  model  of  an  embedded 
resistor  shown  in  Fig.  2  is  developed.  The  EM  data  of  the 
embedded  resistor  is  automatically  generated  from  EM 
simulation  of  Sonnet  [18].  Length  (L)  and  width  (W)  are 
used  as  inputs.  The  outputs  are  real  and  imaginary  parts  of 
Sii  and  S21  in  the  EM  data.  Fig.  3  shows  the  structure  of  the 
EC-NN  model  for  the  embedded  resistor,  which  includes  an 
equivalent  circuit  and  a  3 -layer  MLP  neural  network. 


Because  the  neural  network  can  provide  the  accurate 
component  values  continuously  varied  with  geometry  for 
the  equivalent  circuit,  the  combined  EC-NN  model  can  be 
in  place  of  the  computationally  intensive  physical/EM  model 
to  efficiently  provide  EM  effects  in  optimization  and  statistic 
design. 


Figure  3.  The  structure  of  the  combined  EC-NN  model  for 
embedded  resistors.  The  equivalent  circuit  is  user  defined. 


(a) 


Figure  2.  3-D  physical  structure  of  embedded  resistor. 


The  neural  network  is  trained  to  learn  the  relationship  about 
the  input  geometry  and  the  four  lumped  component  values 
(Rl,  R2,  Cl,  C2).  After  the  MLP  is  well  trained,  it  can 
accurately  calculate  the  component  values  based  on  any 
within  geometrical/physical  parameters  for  the  given 
equivalent  circuit  even  the  parameters  was  never  used  in 
training.  Testing  is  performed  by  comparing  the  outputs  of 
the  overall  EC-NN  model  and  EM  data,  shown  in  Fig.  4(a). 


(b) 

Figure  4.  Comparison  of  real  part  of  S21  of  embedded  resistor 
EC-NN  model  outputs  (a)  or  SSE-NN  model  outputs  (b)  and 
independent  EM  data  which  was  never  used  in  training.  Curves 
A  are  generated  based  on  W  =  1.346  and  L  =  0.279  mm. 
Curves  B  are  generated  based  on  W  =  0.99  and  L  =  0.254  mm. 
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The  test  error  of  combined  EC-NN  model  is  5.8%.  Further 
improvement  of  accuracy  requires  new  topology  of 
equivalent  circuit.  Instead  of  using  human  based  trial  and 
error  process,  we  use  the  proposed  SSE-NN  modeling 
method.  As  the  equivalent  circuit  for  the  embedded  resistor 
uses  three  capacitors,  a  3"*  order  transfer  function  can 
express  the  behavior  of  the  embedded  resistor  in  the  SSE- 
NN  model. 

Table  I  shows  the  model  test  error,  which  we  achieved, 
based  on  various  orders  of  state  equations  in  SSE-NN 
modeling  development.  The  test  error  demonstrated  that  the 
optimal  number  of  internal  states  is  three.  In  4*  order 
model,  the  additional  internal  state  could  not  play  an 
important  role  in  the  EM  behavior  representation.  However, 
more  coefficients  are  needed  in  transfer  function,  more 
freedom  in  parameter  extraction  and  neural  network 
training. 

The  best  results  are  obtained  with  the  3"^  order  SSE-NN  model. 
The  agreement  between  3'^''  order  SSE-NN  model  and  EM 
data  is  achieved  even  though  the  independent  testing  data 
was  never  seen  in  training,  shown  in  Fig.  4(b).  To  verily 
stability  and  passivity,  the  three  LHP  poles  of  the  embedded 
resistor  model  are  -1.4411  and  -0.0144  ±  j0.0539,  and  the 
passivity  condition  is  satisfied  as  shown  in  Fig.  5. 

Table  I.  Comparison  of  resistor  SSE-NN  model  with  different 
order  formulations. 
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1.12% 
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Figure  5.  The  3'“*  order  SSE-NN  model  in  frequency- 
domain  simulation  and  yy*  {j,k  =  1,2)  are  real  part  of  the  Y 
matrix  elements.  The  W  is  swept  from  0.952mm  to 
1.397mm. 


B.  Embedded  Square  Capacitor 

The  physical  stracture  of  an  embedded  square  capacitor  is 
shown  in  Fig.  6.  The  input  parameters  include  length  (L), 
capacitor  dielectric  constant  (Ercap),  and  frequency.  Real  and 
imaginary  parts  of  S-parameters  are  generated  from  3D  full 
wave  EM  simulator,  Ansoft-HFSS  [19].  Fig.  7  shows  the 


equivalent  circuit  used  in  our  combined  EC-NN  model  for 
the  embedded  capacitor. 


Figure  6.  3-D  physical  layout  of  embedded  capacitor . 

The  neural  network  is  trained  to  learn  the  embedded 
capacitor  inputs  and  lumped  component  values.  For 
example,  Ll=0.035nH,  Cl=  1.135pF,  C2=0.537pF  when 
L=0 .736mm  and  £,,ap=17.5.  The  S-parameter  comparison 
between  the  EC-NN  model  and  original  EM  data  is  shown 
in  Fig.  8(a).  Table  II  illustrates  the  different  test  error, 
which  we  achieved,  based  on  varied  order  formulas  in  SSE- 
NN  modeling  development. 

The  optimal  transfer  function  is  3"''  order  to  represent  the 
EM  based  capacitor.  Testing  is  performed  by  comparing  the 
outputs  of  combined  SSE-NN  models  and  EM  data.  The 
agreement  between  our  3'^'^  order  SSE-NN  model  and  EM 
data  is  obtained  even  though  the  independent  testing  data 
was  never  seen  in  training,  shown  in  Fig.  8(b). 


Figure  7.  The  combined  EC-NN  model  structure  for 
embedded  capacitor.  The  equ  ivalent  circuit  is  user  defined. 
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(b) 


Figure  8.  Comparison  of  real  part  of  S21  of  embedded 
capacitor  EC-NN  model  outputs  (a)  or  SSE-NN  model  outputs 
(b)  and  independent  EM  data.  Curves  A  and  B  are  generated 
based  on  inputs  L  =  0.736mm  and  L  =  0.787mm  respectively. 

Table  II.  Comparison  of  capacitor  SSE-NN  model  with 
different  order  formulations. 


Order 

Test  Error 

2"“ 

2.20% 

3rd 

1.67% 

4“ 

2.57% 

3.  Signal  Integrity  Example 

To  further  confirm  the  validity  of  the  proposed  combined 
model  in  time-domain,  we  plugged  the  above  resistor  and 
square  capacitor  SSE-NN  models  into  a  time-domain 
simulator,  i.e.,  Hspice  [20]  to  perform  circuit  simulation  and 
optimization  including  geometrical  and  physical  parameters 
of  the  embedded  passives.  The  models  help  to  achieve  a 
convenient  link  between  EM  behaviors  and  high-level 
circuit  design,  improving  design  accuracy  and  efficiency.  In 
this  paper,  we  use  signal  integrity  of  multilayer  circuit  as 
shown  in  Fig.  9,  where  the  length  and  width  of  embedded 
resistor  and  length  and  dielectric  constant  of  embedded 
capacitor  are  adj  ustable. 


Output  buffer 


3 


Figure  9.  Three  dimensional  illustration  of  signal  integrity 
of  multilayer  circuit  with  embedded  resistor  and  capacitor. 


Figure  10.  Comparison  of  signal  from  input  buffer,  and  output 
signals  before  and  after  combined  SSE-NN  models 
optimization. 

The  optimization  used  136  iterations  including  repetitive 
evaluation  of  combined  SSE-NN  models  to  reach  the 
criteria  of  the  optimization  goal  and  the  total  computation 
time  based  on  our  combined  SSE-NN  models  is 
3.75minutes.  The  results  show  that  the  combined  models 
provide  possibility  to  adjust  the  geometry  of  embedded 
passives  in  high-frequency  circuit  design.  Because  we  used 
neural  models  to  learn  the  nonlinear  relationship  between 
geometry  and  coefficient  vectors,  the  geometry  becomes 
variable  in  circuit  design. 

We  also  performed  statistical  analysis  of  the  signal  integrity 
circuit  with  our  SSE-NN  models  in  a  three-coupled 
transmission  line  circuit  as  shown  in  Fig.  11.  Monte-Carlo 
analysis  of  signal  integrity  curves  with  geometrical 
parameters  as  statistical  design  variables  are  shown  in  Fig. 
12.  The  total  simulation  time  for  500  output  curves  based  on 
the  geometry  tolerance  around  the  nominal  design  center  is 
8.24  minutes  using  proposed  neural  models  by  Hspice. 
However,  the  required  time  of  Ansoft-HFSS  for  500 
different  geometry  is  more  than  8  hours.  The  proposed 
combined  models  retain  the  advantages  of  neural  network 
learning,  speed,  and  accuracy,  and  provide  EM  effects  in 
high-level  circuit  design. 


Measurement 


In  optimization  process,  whenever  optimization  changes  the 
geometry,  the  corresponding  combined  models  are  called 
with  the  new  geometrical  dimensions  as  inputs.  From  output 
comparison,  as  shown  in  Fig.  10,  the  output  curves  have 
been  improved  in  terms  of  distortion  and  time  delay. 


Figure  11.  The  three  coupled  transmision  line  circuit. 
H  :  EM  capacitor  SSE-NN  model; 

H  :  EM  resistor  SSE-NN  model 
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Time  (ns) 

Figure  12.  Output  of  Monte-Carlo  analysis  of  the  3-coupled 
transmission  line  example  using  SSE-NN  models  of 
embedded  passives.  Here  20  randomly  chosen  curves  are 
shown  out  of  500  simulations  of  the  circuit  of  Fig.  11. 

VI.  Conclusions 

In  this  paper,  we  presented  a  new  method  for  modeling 
embedded  passives  suitable  for  both  frequency  and  time 
domain  simulation.  The  combined  models,  which  utilize 
neural  network  and  equivalent  circuit  or  state  space 
equation  techniques,  are  developed  from  EM  data. 

The  accuracy  of  the  combined  EC-NN  model  will  depend 
on  the  equivalent  circuit  in  the  combined  model  for  the 
entire  frequency  range.  If  the  accurate  and  reliable 
equivalent  circuit  is  available,  EC-NN  will  be  generated 
efficiently,  because  the  number  of  lumped  elements  in 
equivalent  circuit  is  less  than  the  number  of  coefficient 
values  in  state  space  equations. 

In  combined  SSE-NN  model  development,  we 
automatically  generate  an  accurate  solution  for  modeling 
embedded  passives,  avoiding  human  based  trial  and  error 
process  in  conventional  approach.  The  combined  SSE-NN 
modeling  technique  acts  as  a  bridge  to  combine  slow 
physical  EM  model  and  fast  equivalent  circuit  model  to 
together.  In  high-speed  circuit  design,  the  combined  neural 
models  allow  geometrical/physical  parameters  to  become 
design  variables  in  circuit  simulation.  Therefore, 
manufacture  geometrical  tolerance  can  be  taken  into 
account  in  circuit  design  efficiently  and  accurately. 
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Abstract — This  paper  presents  a  multiclass,  multilabel  Im¬ 
plementation  of  Least  Squares  Support  Vector  Machines  (LS- 
SVM)  for  DOA  estimation  in  a  CDMA  system.  For  any 
estimation  or  classification  system  the  algorithm’s  capabilities 
and  performance  must  be  evaluated.  This  paper  includes  a  vast 
ensemble  of  data  supporting  the  machine  learning  based  DOA 
estimation  algorithm.  Accurate  performance  characterization  of 
the  algorithm  is  required  to  justify  the  results  and  prove  that 
multiclass  machine  learning  methods  can  be  successfully  applied 
to  wireless  communication  problems.  Thel  earning  algorithm 
presented  in  this  paper  includes  steps  for  generating  statistics 
on  the  multiclass  evaluation  path.  The  error  statistics  provide 
a  confidence  level  of  the  classification  accuracy. 


I.  Introduction 

Machine  leamingr  esearch  has  largely  been  devoted  to 
binary  and  multiclass  problems  relating  to  data  mining,  text 
categorization,  and  pattern  recognition.  Recently,  machine 
learning  techniques  have  been  applied  to  various  problems 
relating  to  cellular  communications,  notably  spread  spectrum 
receiver  design,  channel  equalization,  and  adaptive  beam¬ 
forming  with  direction  of  arrival  estimation  (DOA).  In  our 
research  we  present  a  machine  learning  based  approach  for 
DOA  estimation  in  a  CDMA  communication  system  [1]. 
The  DOA  estimates  are  used  in  adaptive  beamfotming  for 
interference  suppression,  a  critical  component  in  cellular 
systems. I  nterference  suppression  reduces  the  multiple  access 
interference  (MAI)  which  lowers  the  required  transmit  power. 
The  interference  suppression  capability  directly  influences 
the  cellular  system  capacity,  i.e.,t  he  number  of  active  mobile 
subscribers  per  cell. 

Beamforming,  tracking,  and  DOA  estimation  are  current 
research  topics  with  various  technical  approaches.  Least 
mean  square  estimation,  Kalman  filtering,  and  neural  net¬ 
works  [2],[3],[4],  have  been  successfully  applied  to  these 

Sandia  is  a  multiprogram  laboratory  operated  by  Sandia  Corporation,  a 
Lockheed  Martin  Company,  for  theU  nited  States  Department  of  Energy 
under  Contract  DE-AC04-94AL85000. 


problems.  Many  approaches  have  been  developed  fore  alcu- 
lating  the  DOA;  three  techniques  based  on  signal  subspace 
decomposition  are  ESPRIT,  MUSIC,  and  Root-MUSIC  [1]. 

Neural  networks  have  been  successfully  applied  to  the 
problem  of  DOA  estimation  and  adaptiveb  eamforming  in 
[4],  [5],  [6].  New  machine  learning  techniques,  such  as 
support  vector  machines(  SVM)  and  boosting  [7],  perform 
exceptionally  well  in  multiclass  problems  and  new  op¬ 
timization  techniques  are  published  regularly.  These  new 
machine  learning  techniques  have  the  potential  to  exceed 
the  performance  of  then  eural  network  algorithms  relating 
to  communication  applications. 

The  machine  learning  methods  presented  in  this  paper 
include  subspace  based  estimation  applied  to  the  sample 
covariance  matrix  of  the  received  signal.  The  one-vs-one 
multiclass  LS-SVM  algorithm  uses  both  training  data  and 
received  data  to  generate  the  DOA  estimates.  The  end  result 
is  an  efficient  approach  for  estimating  the  DOAs  in  CDMA 
cellular  architecture  [1]. 

This  paper  is  organized  as  follows.  Section  II  presents 
the  system  models  for  an  adaptive  antenna  array  CDMA 
systems.  A  review  of  binary  and  multiclass  machine  learning 
methods  is  presented  in  Section  III,  along  with  background 
information  on  the  LS-SVM  algorithm.  Section  IV  includes 
a  brief  review  of  classic  DOA  estimation  algorithms  and 
the  elements  of  a  machine  learning  based  DOA  estimation 
algorithm.  Section  V  presents  a  one-vs-onem  ulticlass  LS- 
SVM  algorithm  for  DOA  estimation  and  simulation  results 
are  presented  in  Section  VI.  Section  VII  includes  a  compar¬ 
ison  between  standard  DOA  estimation  algorithmsa  nd  our 
machine  learning  based  algorithm. 

II.  System  Models 

This  section  includes  an  overview  of  system  models  for 
the  received  signal  and  adaptive  antenna  arrays  designs. 
All  notation  is  described  below  and  is  consistently  used 
throughout  the  paper. 
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A.  Received  Signal  at  Antenna  Array  output 

The  baseband  signal,  va  {t),  from  the  antenna  array  is 


taH)  =  As{t)+nr{t).  (1) 

A  =  a(02)  ...  a(0^)  ]  (2) 

a(6>,)  =  [  1  ...  ]^(3) 

s{t)  =  [  Si  (n)  S2  (n)  ...  sl  (n)  ]  (4) 

Si  {t)  =  Vpu  {t)q‘bi  {t) ,  for  path  I,  (5) 


where  r^i  (f)  is  the  received  signal  of  mobile  i,  A  is  a 
D  X  L  array  steering  vector  for  D  antenna  elements  and 
L  transmission  paths,  s  (t)  is  the  L  x  1  received  base¬ 
bands  ignal  at  the  output  of  the  matched  filter,  a(0()  = 
[  1  e~^^‘  ...  is  the  £)  X  1  steering 

vector,  ki  =  ^sin^i,  v  is  the  spacing  between  antenna 
elements,  Wc  is  the  carrier  frequency,  c  is  the  velocity  of 
propagation,  ^iis  the  direction  of  arrival  of  the  I  signal, 
Pti  {t)  is  the  transmit  signal  power  from  mobile  i,  q\  is  the 
attenuation  due  to  shadowing  from  path  I,  bi  (i)  is  the  data 
stream  of  mobile  i,  and  iir  (t)  is  the  additive  noise  vector. 

To  ease  the  complexity  of  the  notation  the  terms  relative 
to  the  multiple  paths  are  combined  as 

=  (6) 

i=i 

In  [8]  Zi  is  defined  as  the  spatial  signature  of  the  antenna 
array  to  the  mobile. 

III.  Support  Vector  Machines  -  Background 

A  major  machine  learning  application,  pattern  classifi¬ 
cation,  observes  input  data  and  applies  classification  rules 
to  generate  a  binary  or  multiclass  labels.  In  the  binary 
case,  a  classification  fimction  is  estimated  using  input/output 
training  pairs,(xi,2/i)  z  =  1 . .  .n,  with  unknown  probability 
distribution,  P(x,  y), 

(xi,  2/1 ),...,  (x„,i/„)  €  X  y,  (7) 

Vi  =  {-Ij  +1}  ■  (8) 

The  estimated  classification  function  maps  the  input  to  a 
binary  output,  /  :  R^  — ►  {— 1,-1-1}.  The  system  is  first 
trained  with  the  given  input/output  data  pairs  then  the  test 
data,  taken  from  the  same  probability  distribution  P(x,  y),  is 
applied  to  the  classification  function.  For  the  multiclass  case 
Y  e  R®  where  V  is  a  finite  set  of  real  numbers  and  G  is  the 
size  of  the  multiclass  label  set.I  n  multiclass  classification  the 
objective  ist  o  estimate  the  function  which  maps  the  input 
data  to  a  finite  set  of  output  labels  /  :  R^  — »  S  (R^)  €  R*^ 

Support  Vector  Machines  (SVMs)  were  originally  de¬ 
signed  for  the  binary  classification  problem.  Much  like 
all  machine  learning  algorithmsS  VMs  find  a  classification 
function  that  separates  data  classes,  with  the  largest  margin. 


using  a  hyperplane  .  The  data  points  near  the  optimal  hyper¬ 
plane  are  the  “support  vectors”.  SVMs  are  a  nonparametric 
machine  learning  algorithm  with  the  capability  of  controlling 
the  capacity  through  the  support  vectors. 

A.  Kernel  Functions 

The  kernel  based  SVM  maps  the  input  space  into  a  higher 
dimensional  feature,  F,  space  via  a  nonlinearm  apping 

r  :  R^  F  (9) 

X  r(x).  (10) 

The  data  does  not  have  the  same  dimensionality  as  the  feature 
space  since  the  mapping  process  is  to  a  non-unique  general¬ 
ized  surface  [9].  The  dimension  of  the  feature  space  is  not  as 
important  as  the  complexity  of  the  classification  functions. 
For  example,  in  the  input  space,  separating  the  input/oufput 
pairsm  ay  require  a  nonlinear  separating  function,  but  in  a 
higher  dimension  feature  spaee  the  input/output  pairs  may  be 
separated  with  a  linear  hyperplane.  The  nonlinear  mapping 
function  r(xi)  is  related  to  kernel,  k(x,Xi)  by 


r(x)T(xi)  =lk(x,Xi).  (11) 


Four  popular  kernel  functions  are  the  linear  kernel,  poly¬ 
nomial  kernel,  radial  basis  function  (RBF),  and  multilayer 
perceptrons  (MLP). 


linear,  k  (x,  x^) 
polynomial,  k  (x,  x^) 

RBF,  k(x,  Xi) 


MLP,  k(x,Xj) 


X  •  Xj 


{(x.xO+0)'' 


tanh  {k  (x  ■  x,)  -1-0) 


(12) 

(13) 

(14) 

(15) 


The  performance  of  each  kernel  function  varies  with  the 
characteristics  of  the  input  data.  Refer  to  [10]  for  more 
information  on  feature  spaces  and  kernel  methods. 


B.  Binary  Classification 

In  binary  classification  systems  the  machine  learning  algo¬ 
rithm  generate  the  output  labels  with  a  hyperplane  separation 
where  yi  e  [—1,1]  represents  the  classification  “label”  of  the 
input  vector  x  .  The  input  sequence  and  a  set  of  training 
labels  are  represented  as  {xi,2/i}"=i  >  Vi  =  {“!)  +1}  ■ 
two  classes  are  linearly  separable  in  the  input  spaee  then  the 
hyperplane  is  defined  as  w^x-bh  =  0,w  is  a  weight  vector 
perpendicular  to  the  separating  hyperplane,  b  isa  blast  hat 
shifts  the  hyperplane  parallel  to  itself.  If  the  input  space  is 
projected  into  a  higher  dimensional  feature  spacet  hen  the 
hyperplane  becomes  w^F  (x)  -1-6  =  0. 

The  SVM  algorithm  is  based  on  the  hyperplane  definition 

[11]. 

2/i  [w^r (Xi) -1-6]  >  l,i  =  l,...,iV. 


(16) 
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Given  the  training  sets  in  (7)  the  binary  support  vector 
machine  classifieri  s  defined  as 

r  N  1 


y  (x)  =  sign 


^aij/ik(x,Xi)  +6 


(17) 


,i=l 


The  non-zero  a-s  are  “support  values”  and  the  corresponding 
data  points,  Xj,  are  the  “support  vectors”.  Quadratic  pro¬ 
gramming  is  one  method  ofs  olving  for  the  ajs  and  b  in  the 
standard  SVM  algorithm. 


C.  Multiclass  Classification 


For  the  multiclass  problem  the  machine  learning  algorithm 
produces  estimates  with  multiple  hyperplane  separations. 
The  set  of  input  vectors  and  training  labels  is  defined  as 
,  Xi  e  R",  2/i  G  {1, . . . , G}  ,  n  is  the  index 
of  the  training  pattern  and  C  is  the  number  of  classes.  There 
exist  many  SVM  approaches  to  multiclass  classification  prob¬ 
lem.  Two  primary  multiclass  techniques  are  one-vs-one  and 
one-vs-rest.  One-vs-onea  pplies  SVMs  to  selected  pairs  of 
classes.  For  C  distinct  classes  there  are  hyperplanes 


that  separate  the  classes.  The  one-vs-rest  SVM  technique 
generates  C  hyperplanes  that  separate  each  distinct  class 
from  the  ensemble  of  the  rest.  In  this  paper  we  only  consider 
the  one-vs-one  multiclass  SVM. 

Platt,  et.al.,  [12]  introduced  the  decision  directed  acyclic 
graph  (DDAG)  and  a  Vapnik-Chervonenkis  (VC)  an^ys|s 
of  the  margins.  The  DDAG  technique  isb  ased  on  ^ 

classifiers  fora  C  class  problem,  one  node  for  each  pair  of 
classes.  In  [12]  it  is  proved  that  maximizing  the  mar^ns  at 
each  node  of  the  DDAG  will  minimize  the  generalization 
error.  The  performance  benefit  of  the  DDAG  architecture, 
is  realized  when  the  i*'*  classifier  is  selected  at  the 
node  and  the  j*'*  class  is  eliminated.  Refer  to  Figure  1  for  a 
diagram  of  a  fourc  lass  DDAG. 


Fig.  I.  Fourc  lass  DDAG  for  one-vs-one  multiclass  LS-SVM  based  DOA 
estimation. 


D.  Least  Squares  SVM 

Suykens,  et.al.,  [13]  introduced  the  LS-SVM  which  is 
based  on  the  SVM  classifier,  refer  to  equation  (17) .  The  LS- 
SVM  classifier  is  generated  from  the  optimization  problem: 

min  £ls  (w,(?!')  = ||w||^ -I- -7^ (f>?,  (18) 

w,6,^  ^  ^  i=l 

7  and  are  the  regularization  and  error  variables,  respec¬ 
tively.  The  minimization  in  (18)includes  the  constraints 

j/i  [w^r(xi) -1-6]  >  1  —  (/li,  i  =  (19) 

The  LS-SVM  includes  one  universal  parameter,  7,  that 
regulates  the  complexity  of  the  machine  learning  model.T  his 
parameter  is  applied  to  the  data  in  the  feature  space,  the 
output  of  the  kernel  function.  A  small  value  of  7  minimizes 
the  model  complexity,  while  a  large  value  of  7  promotes 
exact  fitting  to  the  training  points.  The  error  variable 
allows  misclassifications  foro  verlapping  distributions  [14]. 
The  Lagrangian  ofe  quation  (18)  is  defined  as 

^LS  (w,6,(f),  q)  =  (^®) 

{vi  [w'^r  (xi)  -1-6]  ~  1  -f 

1  =  1 

where  Qj  are  Lagrangian  multipliers  that  can  either  be 
positive  or  negative.  The  conditions  of  optimality  are 


dZLS 

dw 

dZis 

db 


n 

0,  w  =5^0:12/1^  (xi) 

i=l 

n 

0,  =  0 

i=l 


(21) 

(22) 


dZi,s 

d4> 

dZis 

dai 


0,  ai  =  -ycpi  (23) 

0,  Vi  [w^r  (xi)  -f  6]  -  1  -f  (/>i  =  0  (24) 


A  linear  system  can  be  constructed  from  equations  (21) 


(24)  [13], 


■  7  0  0  ■ 

W 

'  0  ' 

000 

h 

0 

0  0  7/  —I 

<t> 

0 

Z  Y  I  0 

a 

_  T  _ 

z  =  [r(xi)^i/i,...,r(x„)^2/„] 

V  =  [l/lj  •  •  ■  1 1/n]  !  1  —  [l)-'-il-] 

(j>  =  a=[au...,an] 


(25) 


(26) 

(27) 

(28) 


By  eliminating  weight  vector  w  and  the  error  variable  4>,  the 
linear  system  is  reduced  to: 


■  0 

'  h  ' 

■  0 

Y  ZZ^  +  'y-^I  _ 

OL 

T 

(29) 
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In  the  linear  systems  defined  in  (25) -(29)  the  support  values 
ai  are  proportional  to  the  errors  at  thed  ata  points.  In  the 
standard  SVM  case  many  of  these  support  values  are  zero, 
but  most  of  the  least  squares  support  values  are  non-zero.  In 
[13]  a  conjugate  gradient  method  isp  roposed  for  finding  b 
and  a,  which  are  required  for  the  SVM  classifier  in  equation 
(17). 

IV.  Algorithms  for  DOA  Estimation 

Two  primary,  classic  methods  for  subspace  based  DOA 
estimation  exist  in  literature.  Multiple  Signal  Classification 
(MUSIC)  [15]  and  Estimation  ofS  ignal  Parameters  Via  Ro¬ 
tational  Invariance  Techniques  (ESPRIT)  [16].  The  MUSIC 
algorithm  is  based  on  the  noise  subspace  and  ESPRIT  is 
based  on  the  signal  subspace. 

Many  computational  techniques  exist  for  working  through 
limitationso  f  DOA  estimation  techniques,  but  currently  no 
techniques  exist  for  as  ystem  level  approach  to  accurately 
estimating  the  DOAs  at  the  base  station.  A  number  of  lim¬ 
itations  relating  to  popular  DOA  estimation  techniques  are: 

1)  the  signals  ubspace  dimension  is  not  known,  many  papers 
assume  that  it  is.T  he  differences  between  the  covariance  ma¬ 
trix  and  the  sample  covariance  matrix  add  to  the  uncertainty, 

2)  searching  all  possible  angles  to  determine  the  maximum 
response  of  the  MUSIC  algorithm,  3)  evaluating  the  Root- 
MUSIC  polynomial  on  the  unit  circle,  4)  multiple  eigen 
decompositions  for  ESPRIT,  5)  computational  complexity  for 
maximum  likelihood  method.  The  capabilities,  in  terms  of 
resolution  and  computational  requirements,  of  these  standard 
DOA  estimation  algorithms  serve  as  the  benchmark  for  the 
machinel  earning  based  DOA  estimation.  Refer  to  Section 
VII  for  a  comparison  between  standard  DOA  estimation 
algorithms  and  the  one-vs-one  multiclass  LS-SVM  DOA 
estimation  algorithm. 

A.  Machine  Learning  for  DOA  Estimation 

To  estimate  the  antenna  array  response,  Zj  = 
know  a(0i)  and  qj.  The  contin¬ 
uous  pilot  signal,  includedi  n  cdma2000,  can  be  used  in 
estimating  q‘.  This  must  be  done  for  each  resolvable  path, 
i.e.,  qi  =  [  qh  ^Qi  ]■  Estimating  A {6)  = 

[  a  (^i) ,  a  {62) ,  ■■■  ,  a  (01.)  I  requires  information  on 

the  DOA. 

The  process  of  DOA  estimation  is  to  monitor  the  outputs 
of  D  antenna  elements  and  predict  the  angle  of  arrival  of  L 
signals,  L<D.  The  output  matrix  from  the  antenna  elements 
is 

A  =  [  a (61)  a (02)  ■■■  a(^L)  ]  (30) 

a(di)  =  [  1  ...  ]^, 

and  the  vector  of  incident  signals  is  0r  = 
[01,  02,  ■■■  ,0l]-  With  a  training  process. 


the  learning  algorithms  generate  DOA  estimates, 
dr  01,  02,  ■■■  ,0l]>  based  on  the  responses  from 

the  antenna  elements,  a(0i). 

For  thep  roposed  machinel  earning  technique  there  is  a 
trade-off  between  the  accuracy  of  the  DOA  estimation  and 
antennaa  rray  beamwidth.  An  increase  in  DOA  estimation 
accuracy  translates  into  a  smaller  beamwidth  and  a  reduction 
in  MAI.  Therefore  the  accuracy  in  DOA  estimation  directly 
influences  the  minimum  required  power  transmitted  by  the 
mobile.  There  should  be  a  balance  between  computing  effort 
and  reduction  in  MAI. 

V.  LS-SVM  DDAG  BASED  DOA  ESTIMATION 
Algorithm 

In  this  paper  we  propose  a  multiclass  SVM  algorithm 
trained  with  projection  vectors  generated  from  the  signal 
subspace  eigenvectors  and  the  sample  covariance  matrix.T  he 
output  labels  from  the  SVM  system  are  the  DOA  estimates. 

The  one-vs-one  multiclass  LS-SVM  DDAG  technique  for 
DOA  estimation  is  trained  for  C  DOA  classes.  The  DDAG 
tree  is  initialized  with  nodes.  Therefore 

one-vs-one  LS-SVMs  are  trained  to  generated  the  hyper¬ 
planes  with  maximum  margin.  For  each  class  thet  raining 
vectors,  x„,  areg  enerated  from  thee  igenvectors  spanning 
the  signal  subspace.  The  number  of  classes  is  dependent 
upon  on  the  antenna  sectoring  and  required  resolution.  For  a 
CDMA  system  the  desired  interference  suppression  dictates 
the  fixed  beamwidth.  CDMA  offers  this  flexibility  since 
the  all  mobiles  use  the  same  carrier  frequency.  For  FDMA 
systems  a  narrow  beamwidth  is  desired,  since  frequency 
reuse  determines  the  capacity  of  a  cellular  system. 

Thes  ignal  subspace  eigenvectors  of  the  received  signal 
covariance  matrix  are  required  for  accurate  DOA  estimation. 
For  a  CDMA  system  with  adaptive  antenna  arrays  the 
covariance  matrix  of  the  received  signal  is 

Rrr=E[rAr^].  (31) 

In  our  machine  learning  based  DOA  estimation  algorithm 
the  principal  eigenvectors  must  be  calculated.  Eigen  decom¬ 
position  (ED)  is  the  standard  computational  approach  for 
calculating  the  eigenvalues  and  eigenvectors  of  a  the  co- 
variance  matrix.  ED  is  a  computationally  intense  technique, 
faster  algorithmss  uch  asP  ASTd  [17]  have  been  developed 
forr  eal-time  processing  applications. 

For  the  LS-SVM  based  approach  to  DOA  estimation 
the  output  of  the  receiver  is  used  to  calculate  the  sample 
covariance  matrix  of  the  input  data  signal  (k) , 

=  i  E  ^A{k)r^(k).  (32) 

k=K-M+l 

The  dimension  of  the  observation  matrix  is  D  x  M,  M  is 
ideal  sample  size  (window  length),  and  the  dimension  of  the 
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TABLE  I 

Projection  coefficients  for  machine  learning  based  power 
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sample  covariance  matrix  is  D  x  D.  The  principal  eigen¬ 
vectors,  vi , . . . ,  V£),  are  calculated  via  eigen  decomposition 
(ED)  or  subspacet  racking  techniques.^Each  eigenvector  is 
used  to  calculate  a  covariance  matrix,  , . . . ,  R^ud  • 

The  algorithm  requires  only  the  set  of  estimated  eigenvec- 
torsf  rom  the  sample  covariance  matrix,  which  are  used  to 
generate  projection  coefficients  for  the  classification  process. 
The  projection  vectors  are  generated  from  the  projection 
of  Rt,„j,  1  <  d  <  D,  onto  the  primary  eigenvector  of 
the  signal  subspace.  In  the  training  phaset  he  hyperplanes 
at  each  DDAG  node  are  constructed  with  thesep  rejection 
vectors.  In  the  testing  phase  R„„,,  is  generated  from  the 
received  signal  r^i  (fc)  and  the  principal  eigenvectors.  Then 
the  projection  coefficients  for  the  i*'' /j^  node  of  the  DDAG 
are  computed  with  dot  products  of  Ru^d  and  the 
training  eigenvectors.  Thisn  ew  set  of  projection  vectorsi  s 
testing  with  the  hyperplane  generated  during  the 

training  phase.  The  DOA  labels  are  then  assigned  based  on 
the  DDAG  evaluation  path.  A  similar  projection  coefficient 
technique  has  been  successfully  applied  to  a  multiclass  SVM 
facialr  ecognition  problem  presented  in  [18].T  able  I  includes 
threes  ets  of  projection  vectors,  each  set  corresponds  to  a 
different  DOA.  From  a  review  of  the  data  it  is  evident  that  the 
classes  are  not  linearly  separable.  The  data  must  be  projected 
to  a  higher  dimension  feature  space  and  tested  against  the 
separating  hypeiplane. 

The  following  algorithm  for  the  one-vs-one  multiclass  LS- 
SVM  implementation  for  DOA  estimation  includesp  repro¬ 
cessing,  training,  and  testing  steps. S  pecifically,  the  algorithm 
requires  two  sets  of  projection  vectors  for  each  DDAG  node. 
This  allows  for  automatic  MSE  calculations  at  each  step  of 
the  DDAG  evaluation  path,  thus  providing  a  unique  method 
fore  rror  control  and  validation. 

•  Preprocessing  for  SVM  Training 

1)  Generate  the  DxN  training  signal  vectors  for  the 
C  LS-SVM  classes,  D  is  the  number  of  antenna 
elements,  N  is  the  number  of  samples. 

2)  Generate  the  C  sample  covariance  matrices, 
U,with  M  samples  from  the  £>  x  data  vector. 

3)  Calculate  the  signal  eigenvector,  S,  from  each  of 


the  C  sample  covariance  matrices. 

4)  Calculate  the  D  x  f  projection  vectors,  U-S,  for 
each  of  the  C  classes.  The  ensemble  of  projection 
vectors  consists  of  samples. 

5)  Store  the  projection  vectors  for  the  training  phase 
and  the  eigenvectors  for  the  testing  phase. 

•  LS-SVM  Training 

1)  With  the  C  projection  vectors  train  the 
nodes  with  the  one-vs-one  LS-SVM  algorithm. 

2)  Store  the  LS-SVM  variables,  ai  and  b  from  equa¬ 
tion  (17) ,  which  define  the  hyperplane  separation 
fore  ach  DDAG  node. 

•  Preprocessing  for  SVM  Testing 

1)  Acquire  D  x  N  input  signal  from  antenna  array, 
this  signal  has  unknown  DOAs. 

2)  Generate  the  sample  covariance  matrix  with  M 
samples  from  the  i?  x  data  vector. 

3)  Calculate  the  eigenvectors  for  the  signal  subspace 
and  the  noise  subspace. 

4)  Generate  the  covariance  matricesf  or  each  eigen¬ 
vector. 

«  LS-SVM  Testing  for  the  i/j  DDAG  Node 

1)  Calculate  TWO  Dxl  projection  vectors  with  the 
desired  eigenvector  covariance  matrix  and  the 
and  eigenvectors  fi'om  the  training  phase. 

2)  Test  both  projection  vectorsa  gainst  the  LS-SVM 
hyperplane  for  the  i/j  node.  This  requires  two 
separate  LS-SVM  testing  cycles,  one  with  the 
projection  vector  from  the  t*'*  eigenvector  and  one 
with  the  projection  vector  from  the  eigenvec¬ 
tor. 

3)  Calculate  the  mean  value  of  the  two  LS-SVM 
output  vectors  (labels).  Select  the  mean  value  that 
is  closest  to  a  decision  boundary,  0  or  1.  Compare 
this  value  to  the  label  definition  at  the  node,  then 
select  the  proper  label. 

4)  Repeat  process  for  the  next  DDAG  node  in  the 
evaluation  path  or  declare  the  final  DOA  label. 

•  Error  Control 

1)  Review  the  MSE  calculations  for  the  DDAG  eval¬ 
uation  path. 

2)  Apply  error  control  and  validation  measures  to 
classify  the  label  as  either  an  accurate  DOA  es¬ 
timate  or  as  NOISE. 

VI.  Simulation  Results 

Two  simulation  plots  are  included  below.  Each  simulation 
consists  of  af  our  class  LS-SVM  DDAG  system.  Figure  2 
shows  results  for  a  ten  degree  range  per  class.  Figure  3  shows 
results  for  a  one  degree  range  per  class. 

The  antenna  array  includes  eight  elements,  therefore  the 
training  and  test  signals  were  8x1  vectors.  The  training 
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and  test  signals  are  the  complex  outputs  from  the  antenna 
array.  The  received  complex  signal  ism  odeled  with  a  zero 
mean  normal  distribution  with  unit  variance;  the  additive 
noise  includes  a  zero  mean  distribution  with  a  0.2  variance. 
This  combination  of  signal  and  noise  power  translates  into  a 
7dB  SIR. 

The  system  training  consists  of  six  DDAG  nodes  for  the 
four  DOA  classes.  Both  the  training  and  test  signals  consisted 
of  1500  samples  and  the  window  length  of  the  sample 
covariance  matrix  was  set  to  five.  Therefore  the  training 
and  test  sets  were  composed  of  300  samples  of  each  8x1 
projection  vector. 

To  completely  test  the  LS-SVM  DDAG  system’s  capa¬ 
bilities  the  simulation  were  automated  to  test  a  wide  range 
of  DOAs.  The  DOA  test  set  consisting  of  signals  ranging 
from  three  degrees  before  the  first  DOA  class  to  three 
degrees  after  the  last  DOA  class.  Thus  there  were  forty- 
six  test  signals  for  Figure  2  and  fourteen  test  signals  for 
Figure  3.  As  can  been  seen  from  the  two  plots  the  LS-SVM 
DDAG  DOA  estimation  algorithm  is  extremely  accurate.  No 
misclassifications  were  logged.  Testing  shows  that  the  LS- 
SVM  DDAG  system  accurately  classifies  the  DOAs  fora  ny 
desired  number  of  classes  and  DOA  separations  from  one 
degree  to  twenty  degrees. 


Fig.  2.  LS-SVM  for  DOA  estimation,  four  classes  with  ten  degree 
separation  between  each. 


A.  Decision  Grids 

The  decision  grid  (DG)  technique  was  developed  to  track 
the  DDAG  evaluation  path  and  generate  statistics  to  char¬ 
acterize  the  confidence  level  of  the  DOA  classifications. 
The  theoretical  DG  (T-DG)  is  a  technique  we  developed  to 
quantify  errors  and  add  insighti  nto  the  robustness  of  the  LS- 
SVM  DDAG  architecture.  The  T-DG  is  a  deterministic  2D 
grid  for  DDAGs  with  a  relatively  small  number  of  classes  and 
small  DOA  range  between  classes.T  he  elements  of  the  T-DG 
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Fig.  3.  LS-SVM  for  DOA  estimation,  four  classes  with  one  degree 
separation  between  each. 


represent  the  deterministic  values  of  the  two  LS-SVM  labels 
at  each  DDAG  level,  thed  eterministic  values  are  referred 
to  as“  theoretical  decision  statistics”.  Designing  T-DGs  for 
DDAGs  with  threet  o  five  classes  and  DOA  ranges  up  to 
five  degrees  between  classes  is  straight  forward.  The  T-DGs 
are  not  deterministic  for  large  DOA  ranges,  i.e.  for  a  DOA 
range  of  ten  degrees  between  classes  empirical  results  show 
that  the  DDAG  evaluation  path  isu  npredictable.  The  large 
DOA  ranges  lead  to  uncertainty  in  the  evaluation  path,  even 
though  the  test  DOA  is  classified  correctly. 

Empirical  decision  grids  (E-DG)  are  automatically  gener¬ 
ated  in  the  LS-SVM  DDAG  DOA  estimation  algorithm.  The 
E-DGs  tabulate  the  mean  of  the  LS-SVM  output  label  vectors 
at  each  DDAG  node  and  level,  the  mean  values  are  referred  to 
as  “decision  statistics”.  The  unique  design  of  this  algorithm 
includes  testing  the  input  data  against  two  hyperplanes  at 
the  i*'* /j*'*  node.  With  this  approach  the  two  output  vectors 
at  each  node  are  compared  to  one  another.  In  a  noise-free 
environment,w  ith  perfect  classification,  the  two  label  vectors 
would  be  binary  opposites,  i.e.  one  label  vector  would  be  all 
O's  and  the  other  label  vector  would  be  all  I's.  This  technique 
enables  computation  of  theoretical  mean  square  errors  and 
empirical  mean  square  errors,  refer  to  Section  VI-B. 

Table  II  includes  a  standard  T-DG  and  Tables  III  and  IV 
include  E-DGsf  or  a  three  classD  DAG  with  a  two  degree 
DOA  range  per  class.  The  two  levels  of  a  three  class  DDAG 
are  equivalent  to  the  first  two  levels  of  a  four  class  DDAG, 
refer  to  Figure  1.  Table  II  includes  the  possible  evaluation 
paths  of  thist  hree  class  DDAG.  The  nodesf  or  each  DOA 
evaluation  path  are  included  for  the  first  and  second  DDAG 
level.  For  example,  DOA  1  has  an  evaluation  path  of  Node 
1  vs  3  at  Level  1  and  Node  1  vs  2  at  Level  2.  In  Table  III 
E-DG  presents  the  decision  statistics  for  a  signal  subspace 
eigenveetor;  in  Table  IV  the  second  E-DG  presents  the 
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TABLE  II 

Theoretic  decision  grid  for  a  DDAG  system  with  3  classes 
AND  A  2  DEGREE  DOA  RANGE. 


DOAs 

Classl  Class  2  Class  3 

T-DG,  Level  I 

1 

2 

3 

4 

5 

Node 

lvs3 

lvs3 

lvs3 

lvs3 

lvs3 

Label  0 

0 

0 

0.5 

1 

1 

Label  1 

1 

1 

0.5 

0 

0 

T-DG,  Level  2 

Node 

lvs2 

lvs2 

Ivs2\2vs3 

2vs3 

2vs3 

Label  0 

0 

0.5 

oy 

0.5 

1 

Label  1 

1 

0.5 

1\0 

0.5 

0 

TABLE  III 

Empirical  decision  grid  for  a  signal  eigenvector 


Signal  Data 

DOAs 
Class  2 

Class  3 

E-DG,  Level  1 

HDH 

2 

3 

4 

Node 

lvs3 

lvs3 

lvs3 

0  ___ 

0 

0.032 

• 

Label  1 

immii 

1 

0.576 

0 

E-DG,  Level  2 

!  Ivs2 

lvs2 

2vs3 

IHEJH 

nfRi 

1 

1 

Ubel  1 

IIH^I 

TABLE  IV 

Empirical  decision  grid  for  a  noise  eigenvector 


Noise  Data 

DOAs 

Class  1  Class  2  Class  3 

E-DG,  Level  1 

1 

2 

3 

4 

5 

Node 

lvs3 

1VS3 

lvs3 

lvs3 

1VS3 

Label  0 

0.328 

0.376 

0.304 

0.352 

0.384 

Label  1 

0.752 

0.744 

0.712 

0.768 

0.776 

E-DG,  Level  2 

Node 

lvs2 

lvs2 

Ivs2\2vs3 

2vs3 

2vs3 

Label  0 

0.232 

0.256 

0.144 

0.136 

0.184 

Label  1 

0.896 

0.904 

0.952 

0.944 

0.944 

decision  statistics  for  a  noise  subspace  eigenvector. 

B.  Theoretical  and  Empirical  MSEs 

The  difficulty  in  tracking  the  performance  of  the  LS-SVM 
DDAG  DOA  estimation  algorithm  is  duet  o  the  numerous 
DDAG  evaluation  paths.  For  many  DDAGs  thee  valuation 
paths  can  be  determined  based  on  the  input  data  and  the 
class  definitions.  How  can  decision  statistics  be  applied  to 
performance  characterization? 

The  two  primary  performance  measures  for  the  LS-SVM 
DDAG  are  the  theoretical  MSB  (T-MSE)  and  the  empirical 
MSB  (B-MSB).  Both  MSB  performance  measures  are  based 
on  MSB  calculations  with  T-DGs  and  B-DGs.  The  T-MSB 
is  a  MSB  calculation  between  the  corresponding  elements  of 
the  T-DG  and  the  E-DG.  This  is  a  measure  of  the  algorithm’s 


empirical  decision  statistics  in  relation  to  the“  theoretical” 
decision  statistics.  For  example,  the  T-MSE  for  a  3  class 
DDAG  is  calculated  with  the  T-DG  and  E-DG  presented  in 
Tables  Ila  nd  III.  The  T-MSE  for  Class  2  is  calculated  as 


Level  1 

Level  2 

Label  0 
(0.5-0.032^ 

Label  1 
(0.5  -  0.576)^ 

Label  0 
(1  - 1)^ 

Label  1 
(0  -  0)^ 

Unlike  the  T-MSE,  the  E-MSE  is  at  echnique  that  allows 
for  real-time  error  tracking  with  only  the  empirical  deci¬ 
sion  statistics.  The  E-MSE  uses  onlyt  he  E-DGs  and  the 
differences  between  the  two  LS-SVM  decision  statistics  at 
each  node  in  the  evaluation  path.  Thisi  sa  measure  of  the 
empirical  classification  accuracy  achieved  at  each  DDAG 
node.  The  E-MSE  for  a3  class  DDAG  is  calculated  with 
only  the  E-DG  presented  in  Table  III.  The  MSE  for  Class  2, 
Level  1  is  (|0.032  -  0.576|  -  1)^  =  0.208  and  the  MSE  for 
Class  2,  Level  2  is  (|1  —  0|  —  1)^  =  0. 

C.  Misclassifications  vs.  Gross  Errors 

Two  secondary  performance  measures  for  the  LS-SVM 
DDAG  are  misclassifications  and  grosse  rrors.  These  mea¬ 
sures  are  used  for  performance  characterization  of  the  multi¬ 
class  LS-SVM  DDAG  DOA  estimation  algorithm  and  for 
tracking  variations  in  performance  for  various  algorithm 
parameters.  Misclassifications  and  gross  errors  can  not  be 
used  in  real  time  implementation  because  knowledge  of  the 
test  DOAs  is  required. 

Misclassifications  measure  “small  shifts”  in  DOA  clas¬ 
sifications.  If  a  DOA  is  located  near  a  border  between 
labelst  he  machine  learning  processc  ould  classify  the  data 
to  an  adjacent  label,  not  the  closest  label.  Therefore,  a 
misclassification  is  a  shift  related  error  where  a  signal  is 
detected,  but  classified  to  a  spatially  adjacent  label.  This  type 
of  error  still  gives  an  indication  of  the  received  DOA.  The 
region  of  misclassifications  is  defined  as  ^  of  the  DOA  range 
applied  to  both  sides  of  a  DOA  class. 

Gross  errors  measure  significant  errors  in  DOA  classifica¬ 
tions.!  f  a  DOA  is  classified  into  a  specific  class,b  ut  spatially 
located  at  least  one  entire  class  away,  then  the  error  is  due 
to  a  breakdown  in  the  maehine  learning  process.  This  tyjje 
of  error  assigns  false/misleading  information  to  a  received 
DOA.  The  region  of  gross  errors  is  defined  as  the  magnitude 
of  the  DOA  range  applied  to  both  sides  of  the  DOA  class. 

Figure  4  displays  the  DOA  regions  for  correct  classi¬ 
fications,  misclassifications  and  gross  errors.  This  specific 
example  is  for  a  DDAG  class  centered  at  0°  with  a  5° 
DOA  range,  i.e.,  any  DOA  in  the  range  [—2,  2]  is  correctly 
classified  to  the  0°  class.  The  region  enclosed  by  the  dashed 
brackets  includes  all  DOAs  that  are  correctly  classified  at 
the  0°  class.  If  any  DOAso  utside  the  dashed  bracketsb  ut 
inside  the  solid  brackets  are  assigned  the  0°  class,  then  that 
DOA  would  be  a  misclassification.  If  any  DOAs  outside  the 
solid  bracketsa  re  assigned  to  the  0°  class,  then  that  DOA 
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would  be  a  gross  error.  The  misclassification  region,  for  a 
DOA  classified  at  0°,  is  DO  A  €  [-4,  -3] ,  [3,4] .  The  gross 
error  region,  for  a  DOA  classified  at  0°,  is  DOA  ^  [—4,4] . 


Fig.  4.  Diagram  of  regions  definingD  OA  misclassifications  and  gross 
errors. 


D.  Kernel  Parameters 

Simulation  results  show  that  kernel  selection  has  the 
greatest  effect,  out  of  all  tunable  variables,  in  thee  lassifi- 
cation  process.  The  four  kernels  discussed  in  Section  III- 
A  are  tested  with  the  LS-SVM  DDAG  DOA  estimation 
algorithm.  The  performances  of  each  kernel  function  and 
the  associated  parameters  are  characterized  with  in  terms  of 
MSB,  misclassifications,  and  gross  errors.  In  addition,  the 
LS-SVM  regularization  parameter,  7,  is  varied  to  show  the 
influence  of  the  LS-SVM  complexity. 

1)  Polynomial  Kernel:  The  polynomial  kernel  provides 
the  best  results,  in  relation  to  the  RBF,  MLP,  and  linear 
kernels.  Figure  5  displays  the  T-MSE  in  terms  of  the  poly¬ 
nomial  degree,  d,  and  constant,  6.  The  simulation  is  based 
on  af  our  class  DDAG  with  a  5°  DOA  rangea  nd  a  fixed 
LS-SVM  variable,  7  =  2.  The  results  show  that  the  degree 
of  the  polynomial  kernel  affects  the  DOA  estimation;  the 
best  values  are  d  =  2  and  d  =  4.  For  d  =  1  the  polynomial 
kemeli  s  equivalent  to  the  linear  kernel.  The  MSB  is  constant 
for  1  <  7  <  6,  and  the  polynomial  constant,  0,  does  not 
influence  the  performance.  The  rate  of  misclassifications  is 
1.2%  with  zero  gross  errors.  The  degree  of  the  polynomial  is 
the  only  factor  affecting  the  computational  time  for  system 
training. 

2)  Radial  Basis  Function  Kernel:  The  performance  of 
the  RBF  kernel  is  characterized  in  terms  of  the  LS- 
SVM  regularization  variable,  7,  and  the  smoothing  parame¬ 
ter,  The  simulation  is  based  on  a  four  class  DDAG  with  a 
5°  DOA  range.T  he  results  show  that  the  MSB  is  constant  for 
7  >  1.5,  and  >  0.5.  The  rate  of  misclassifications  is  0.4% 
with  zero  gross  errors.  The  training  time  increases  with  the 
value  of  7  and  for  small  values  of  The  performance  of 


Fig.  5.  Theoretical  MSE  for  a  the  polynomial  kernel,  the  DOA  range 
is  5°  and  spans  the  DDAG  classes  at  30°,  35°,  40°,  45°,  the  LS-SVM 
parameter,  7,  is  set  at  2. 

the  RBF  kernel  matches  the  performance  of  the  polynomial 
kernel  for  DOAs  in  the  range  of  15°  to  60°.  The  performance 
of  the  polynomial  kernel  exceeds  that  of  the  RBF  kernel  for 
DOAs  <15°  and  >  60°. 

3)  Multilayer  Perceptron  Kernel:  Results  show  that  the 
MLP  kernel  is  ineffective  in  maintaining  a  low  MSE  for  the 
range  of  parameters  tested.  The  rate  of  misclassifications  is 
42.5%  and  the  rate  of  gross  errors  is  17.2%.  Overall  the 
performance  of  the  MLP  kernel  is  inferior  to  the  polynomial 
and  RBF  kernels. 

4)  Linear  Kernel:  The  linear  kernel  ise  quivalent  to  the 
polynomial  kernel  with  d  =  1.  Large  MSE  values  show 
that  the  linear  kernel  is  not  effective  in  the  LS-SVM  DOA 
estimation  algorithm.  The  average  T-MSE  is  27.8%  and  the 
average  E-MSE  is  61.1%. 

E.  Training  and  Test  Vectors 

The  design  of  training  sequences  is  an  important  factor  in 
machine  learning  applications.  For  adaptive  antennaa  rrays 
the  training  sequences  represent  the  array  outputs  for  the 
C  DOA  classes.  Three  specific  elements  of  the  training 
sequences  are  noise  variance,  training  vector  length,  and 
length  of  thes  amplec  ovariance  window.  The  requirement 
is  to  design  training  sequences  that  minimize  both  the 
training  error  and  generalization  error.  Empirical  analysis 
of  the  multiclass  LS-SVM  based  DOA  estimation  algorithm 
shows  that  training  error  is  effectively  zero;  the  hyperplane 
separation  of  the  data  in  the  feature  space  is  welld  efined  and 
separable.  In  this  papert  he  generalization  error  is  expressed 
in  terms  of  MSEs,  misclassifications  and  gross  errors. 

The  primary  method  for  training  LS-SVM  DDAG  systems 
for  DOA  estimation  is  based  on  synthetic  training  vectors 
generated  with  known  noise  power  and  preselected  vector 
lengths.  In  practice,  the  training  vectors  would  be  stored  in 
the  memory  of  the  receiver  that  employs  the  DOA  estimation 
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algorithm.  This  approach  allows  for  offline  training  of  the 
binary  LS-SVM  algorithms. 

Simulation  results  show  that  the  LS-SVM  DOA  estimation 
algorithm  is  robust,  in  terms  of  MSB,  when  analyzed  for  a 
range  ofS  IRs  in  the  training  vectors  and  the  test  signals.  In 
general,  the  noise  power  of  the  training  vectors  doesn’t  have  a 
dramatic  effect  on  the  generalization  error.  Simulations  were 
conducted  with  training  vectors  that  included  SIRs  in  the 
range  of  20  dB  to  7  dB.  Review  of  them  isclassification 
and  gross  error  statistics  show  that  training  vectors  with  noise 
variances  of  0.04  and  0.12,  which  correspond  to  SIRs  of  13 
dB  and  10  dB,  provide  the  best  performance. 

1)  Length  of  Training  and  Testing  Vectors:  Figure  6 
includes  two  plots  of  average  theoretical  MSB  versus  training 
vector  length.  The  data  is  specific  to  a  four  class  LS-SVM 
DDAG  system  with  a  four  degree  polynomial  kernel.  The  two 
plots  show  that  the  window  length  of  the  sample  covariance 
matrix  does  not  impact  the  performance.L  ikewise  there  is  no 
correlation  between  the  length  of  the  training  vector  and  the 
MSB.  The  results  in  Figure  6  are  based  on  test  vectors  with 
size  equivalent  to  the  training  vectors.  Figure  7  is  a  3D  plot 
of  the  theoretical  MSB  as  a  function  of  vector  dimensions; 
the  dimensions  of  the  training  vectors  and  input  data  vectors. 
The  length  of  the  input  data  vector  ranges  fi-om  0.5  to  2  times 
the  length  of  the  training  vectors.  The  data  shows  that  range 
of  input  data  vectors  has  no  effect  on  the  MSB  statistics. 


Fig.  6,  Average  theoretical  MSE  as  a  function  of  training  vector  length. 
Two  data  plots  are  included;  one  plot  is  for  a  sample  covariance  matrix  with 
a  five  sample  window,  one  plot  is  for  a  sample  covariance  matrix  with  a  ten 
sample  window. 

Table  V  shows  the  processing  times,  in  seconds,  required 
for  training  a  four  class  LS-SVM  DDAG  system  with  a 
four  degreep  olynomial  kernel,  and  testing  the  input  data. 
The  results  Data  is  included  for  training  and  test  vectors 
that  range  from  25  samples  to  200  samples.  The  simulations 
were  conducted  with  a  Pentium  4  running  at  2.5  GHz.  The 
processing  times  are  relative  to  the  computer  system  and  the 
level  of  optimization  applied  to  the  programming,  but  serve 


Fig.  7.  Theoretical  MSE  as  a  function  of  training  vector  length  and  input 
vector  length.  The  LS-SVM  DDAG  system  includes  four  class  and  a  four 
degree  polynomial  kernel.  The  test  window  multiplier  defines  the  input 
vector  length,  i.e.  the  input  vector  length  ranges  between  0.5  to  2  times 
the  training  vectorl  ength, 

TABLE  V 

Processing  times,  in  seconds,  for  one-vs-one  multiclass 
LS-SVM  FOR  DOA  estimation. 


Vector  Size 

25  50  75  100  125  150  175  200 

Train 

0.30 

0.94 

2.25 

4.49 

7.39 

11,27  15.23 

20.38 

Test 

0.20 

0.23 

0.31 

0.47 

0.56 

0.66  0,72 

0.91 

as  a  basic  indicator  for  possible  hardware  implementation 
and  real-time  applications. 

The  data  in  this  section  shows  that  the  design  of  the 
training  vectors  is  important,  but  there  is  a  tolerance  in  the 
selection  of  noise  power  and  training  vector  length.  The 
available  tolerance  in  choosing  parameters  of  the  training 
vectors  validates  the  design  of  the  LS-SVM  DOA  estimation 
algorithm.  This  characteristic  allows  flexibility  in  the  system 
design  and  provides  a  high  confidence  level  in  the  DOA 
estimates.  In  addition,w  hen  considering  real-time  implemen¬ 
tation  of  the  algorithm,  the  dimensions  of  the  training  vector 
must  be  carefully  reviewed.  Shorter  training  vectors  offer 
high  performance,  in  terms  of  MSB,  and  fast  training  times. 

F.  Range  of  DDAG  Parameters  for  DOA  Estimation 

The  exceptional  performance  of  the  LS-SVM  DDAG  DOA 
estimation  algorithm  has  been  proved  in  thep  revious  sec¬ 
tions.  Most  thep  revious  simulation  results  wereb  ased  on 
three  and  four  class  DDAGs.  To  cover  the  desired  span  of 
the  antenna  array  sector  the  algorithm  must  be  flexible  in  the 
numbero  f  DDAG  classes  and  DOA  ranges.  Different  appli¬ 
cations  require  different  DDAG  architectures.  Many  times 
the  application  will  require  fast  training  and  high  accuracy. 
Training  a  LS-SVM  DDAG  system  can  be  performed  offline. 
But  covering  a  large  anteima  sector  with  high  resolution 
would  require  either: 


ROHWER,  ABDALLAH:  MULTICLASS  LEAST  SQUARES  MACHINES  FOR  DIRECTION  OF  ARRIVAL  ESTIMATION 


107 


TABLE  VI 

Percentage  of  misclassifications  versus  DDAG  classes  (3-6) 
AND  DOA  RANGES  (I- 10). 


Classes 

I 

2 

DOA  Range  between  Classes,  Degrees 

3  4  5  6  7  8  9 

10 

3 

0 

0 

0 

0 

6.7 

0 

4.8 

4.2 

0 

0 

4 

0 

0 

0 

0 

0 

0 

3.6 

3.1 

0 

0 

5 

0 

0 

0 

0 

4.0 

0 

2.9 

0 

6.7 

0 

6 

0 

0 

0 

0 

0 

0 

4.8 

0 

5.6 

U 

1)  A  DDAG  with  a  large  number  of  classes  and  a  small 
DOA  range, 

2)  A  two  stage  system  where  the  antenna  sector  is  parti¬ 
tioned  into  a  set  number  of  classes  with  a  wide  DOA 
range.  First,  the  signal  is  detected  in  a  specific  partition, 
then  a  DDAG  structure  for  high  resolution  can  classify 
the  DOA  with  high  accuracy 

Whatever  the  desired  approach  is,  the  LS-SVM  DDAG  algo¬ 
rithm  must  be  flexible  in  design  and  robust  in  performance. 

The  data  in  this  section  proves  the  performance  for  a  wide 
range  of  DDAG  structures.  Simulations  were  conducted  for 
three  to  ten  classes  with  DOA  ranges  between  1°  and  20°. 
With  these  classes  and  DOA  ranges  the  LS-SVM  DDAG 
algorithms  is  able  to  span  antenna  sectors  of  3°  to  90°. 
Table  VI  lists  the  number  of  misclassifications.  Seventy-five 
percent  of  the  DDAG  structures  with  DOA  ranges  between 
1°  and  10°  have  zero  misclassifications;  the  average  rate  of 
misclassifications  for  the  set  of  DDAG  structures  is  1.2%. 
The  largest  percentage  of  misclassifications  is  6.7%  and 
occurs  with  a  five  class  DDAG  with  a  nine  degree  DOA 
range. 

G.  Multilabel  Capability  for  Multiple  DOAs 

In  DOA  estimation  for  cellular  systems,  there  can  be 
multiple  DOAs  for  a  given  signal.  This  results  from  multipath 
effects  induced  by  the  communication  channel.  The  machine 
learning  system  must  be  able  to  discriminate  between  a  small 
number  of  independent  DOAs  that  include  signal  components 
with  similar  time  delays.  With  this  constraint  the  machine 
learning  algorithm  then  mustb  e  a  multiclass  system  and  able 
to  process  multiple  labels. 

The  machine  learning  algorithm  must  generate  multiclass 
labels,  Vi  e  C,  where  C  e  [-90, 90]  is  a  set  of  real  numbers 
that  represent  an  appropriate  range  of  expected  DOA  values, 
and  multiple  labels  yi,i  =  1 . .  .L  for  L  dominant  signal 
paths.  If  anterma  seetoring  is  used  in  the  cellular  system  the 
multielass  labels  are  from  the  set  C  e  [§i],  where  Si  is  field 
of  view  for  the  sector. 

Multilabel  classification  is  possible  with  the  LS-SVM 
DDAG  algorithm  presented  in  Section  V.  The  machine 
learning  algorithm  for  DOA  estimation  assigns  DOA  labels 
to  each  eigenvector  in  the  signal  subspace.  By  repeating  the 


DDAG  cycle  for  each  eigenvector  the  multiclass  algorithm 
has  the  capability  of  assigning  multiple  labelst  o  the  input 
signal. 

VII.  Comparison  to  Standard  DOA  Estimation 
Algorithms 

Thep  erformanceo  f  the  one-vs-one  multiclass  LS-SVM 
algorithm  for  DOE  estimation  is  described,  in  detail,  in 
the  previous  section.  The  results  show  that  the  multiclass 
classification  approach  to  DOA  estimation  providesu  nique 
benefits,i  n  terms  of  computational  complexity  and  flexibility. 
Each  algorithm  is  trained  for  C  DOA  classes.  The  number 
of  classes  is  dependent  upon  on  the  antenna  sectoring  and 
required  resolution.  The  ideal  application  of  this  technique 
is  CDMA  cellular  systems.  For  a  CDMA  system  the  desired 
interference  suppression  dictates  the  fixed  beamwidth.  A 
reduction  in  beamwidth  corresponds  to  a  reduction  in  MAI, 
thus  reducing  the  required  transmit  power  at  the  mobile 
subscriber.  CDMA  offers  this  flexibility  since  the  all  mobiles 
use  the  same  carrier  frequency.  For  Frequency  Division 
Multiple  Access  (FDMA)  systems  a  narrow  beamwidth  is 
desired,  since  frequency  reuse  factors  into  the  capacity  of  a 
cellular  system,  thus  requiring  accurate  DOA  estimates  with 
high  resolution. 

A.  Computational  Complexity 

Conventional  subspace  based  DOA  estimation  algorithms, 
such  as  MUSIC  and  ESPRIT,  are  computationally  complex. 
The  algorithms  require  accurate  knowledge  of  the  signal 
subspace  dimension  and  accurate  estimates  of  the  signal 
and  noise  subspace  eigenvectors.  Additionally,  the  MUSIC 
algorithm  requires  a  precise  characterization  of  the  antenna 
array  and  the  ESPRIT  algorithm  requires  multiple  eigen 
decompositions. 

Theo  ne-vs-onem  ulticlass  LS-SVM  algorithm  for  DOA 
estimation  is  flexible,  with  respect  to  computationally  re¬ 
quirements.  The  training  cycle  for  the  LS-SVM  based  DDAG 
iss  traight  forward  and  can  be  completed  offline  with  sim¬ 
ulated  data.  The  only  information  required  is  the  size  of 
the  antenna  array  and  the  number  of  DDAG  nodes,  which 
corresponds  to  DOA  classes.F  or  accurate  DOA  estimates  the 
only  information  required,  for  the  LS-SVM  DDAG  testing 
cycle,  is  the  dimension  of  thea  ntenna  array  and  accurate 
eigenvector  estimates  of  the  sample  covariance  matrix.  The 
dimension  of  the  signal  subspace  is  not  required,  nor  is 
accurate  characterization  of  the  antenna  array. 

B.  Simulation  Results 

Figure  8  compares  the  one-vs-one  multiclass  LS-SVM 
DOA  estimation  algorithm  and  the  MUSIC  algorithm.  The 
top  window  shows  perfect  DOA  estimation  for  the  machine 
learning  method  presented  in  this  paper.  The  multiclass 
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algorithm  includes  an  eight  class  DDAG  and  a  one  de¬ 
gree  DOA  range  per  class.  Note  that  multiclass  LS-SVM 
algorithm  classifies  signalso  utside  the  DOA  classest  o  the 
nearest  class,  as  shown  with  theD  OAs  at  12°  —  14°  and 
23°  —  25°.  The  bottom  window  displays  the  DOA  estimation 
with  the  MUSIC  algorithm,!  00  DOA  estimates  are  averaged 
for  each  received  signal  and  the  amplitudes  are  normalized 
to  the  largest  estimate.  Thep  lots  show  that  ther  esolution 
capabilities  one-vs-one  multiclass  LS-SVM  DOA  estimation 
algorithm  equal  that  of  the  MUSIC  algorithm.One  drawback 
of  the  MUSIC  algorithm  is  the  broad  width  of  the  DOA 
estimate;  a  level  detection  step  is  required  to  accurately  select 
the  maximum  response. 

Figure  9  compares  the  errors  and  DOA  estimates  ofe  ach 
algorithm.  For  this  simulation  the  one-vs-one  multiclass  LS- 
SVM  algorithm  includes  a  seventeen  class  DDAG  and  a 
five  degree  DOA  range  per  class.  The  top  window  plots 
the  errors!  n  the  DOA  estimates  for  ninety  degree  antenna 
sector  and  one  DOA  sample  per  degree.  The  definitionso  f 
an  error  are  specific  to  the  two  algorithms.  For  the  machine 
learning  based  algorithm,  an  error  is  defined  as  a  DOA  that  is 
classified  into  a  wrong  DOA  class.  For  the  MUSIC  algorithm 
an  error  is  the  difference  between  the  estimated  DOA  and 
the  actual  DOA.  As  shown  in  the  top  window,  the  only 
errors  associated  with  theL  S-SVM  based  algorithm  occur 
for  DOAs  greater  than  82°.  The  DOAs  in  error  are  classified 
into  the  spatially  adjacent  DOA  class  at  80°.  Likewise,  the 
errors  associated  with  the  MUSIC  algorithm,  that  are  greater 
than  l°,o  ccur  for  DOAs  greater  than  70°.  The  plots  in  Figure 
9  prove  the  robust  performance  of  the  one-vs-one  multiclass 
LS-SVM  algorithm  for  DOA  estimation. 


DOAs  Calculated  with  LS-SVM 


DOAs 


Fig.  8.  Comparision  between  the  LS-SVM  based  DOA  estimation  algorithm 
and  the  MUSIC  algorithm.  The  one-vs-one  multiclass  LS-SVM  DOA 
estimation  algortihm  includes  eight  classes  and  a  one  degree  DOA  range. 


DOAs 

Fig.  9.  Comparision  of  errors  and  estimated  DOAs  for  the  LS-SVM 
based  DOA  estimation  algorithm  and  the  MUSIC  algorithm.  The  one-vs-one 
multiclass  LS-SVM  DOA  estimation  algortihm  includes  seventeen  classes 
and  a  five  degree  DOA  range. 

C.  Benefits  over  Standard  Techniques 

Evaluation  of  the  performance  statistics.  Section  VI, 
proves  that  the  one-vs-one  multiclass  LS-SVM  algorithm  for 
DOA  estimation  is  reliable  with  a  high  degree  of  accuracy.  In 
terms  of  performance  our  new  algorithm  provides  the  same 
capabilities  as  the  standard  DOA  estimation  methods.S  pecif- 
ically,  accurate  DOA  estimates,  to  a  one  degree  resolution, 
can  be  achieved  with  the  standard  subspace  based  algorithms 
and  our  machine  learning  based  algorithm.  The  primary 
benefits  of  our  LS-SVM  based  DOA  estimation  algorithm  are 
the  reduced  computational  complexity,  described  above,  and 
the  flexibility,  in  terms  of  DOA  classes  versus  requirements. 
The  specific  application  dictates!  he  desired  resolution  and 
therefore  the  number  of  DOA  classes.  For  example,  one 
application  may  include  a  sixty  degree  antenna  sector  and  a 
desired  resolution  of  ten  degrees.  These  requirements  would 
translate  into  a  seven  class  system.  Another  application  may 
include  a  twenty  degree  sector  and  a  desired  resolution  of  two 
degrees;  this  would  translate  into  a  eleven  class  system.  An 
additional  option  is  to  place  two  DDAG  systems  in  series,  as 
described  in  Section  VI-F,  that  allows  for  a  high  resolution 
with  a  small  number  of  classes.  In  general,  the  one-vs-one 
multiclass  LS-SVM  algorithm  for  DOA  estimation  can  be 
adapted  to  specific  requirements,  as  influenced  by  system 
capacity,  channel  conditions,  and  available  computational 
resources.  The  MUSIC  and  ESPRIT  algorithms  offer  no 
flexibility,  in  terms  of  DOA  resolution  and  computational 
resources. 

VIII.  Conclusion 

In  this  paper  we  presented  a  machine  learning  architecture 
for  DOA  estimation  as  applied  to  a  CDMA  cellular  system. 


ROHWER,  ABDALLAH:  MULTICLASS  LEAST  SQUARES  MACHINES  FOR  DIRECTION  OF  ARRIVAL  ESTIMATION 


109 


The  broad  range  of  our  research  in  machine  learning  based 
DOA  estimation  includesm  ulticlass  and  multilabe!  classifi¬ 
cation,  classification  accuracy,  error  control  and  validation, 
kernel  selection,  estimation  of  signal  subspace  dimension, 
and  overall  performance  characterization.  Wep  resented  an 
overview  of  a  multiclass  SVM  learning  method  ands  uc- 
cessful  implementation  of  a  one-vs-one  multiclass  LS-SVM 
DDAG  system  forD  OA  estimation. 

The  LS-SVM  DOA  estimation  algorithm  is  superior  to 
standard  techniquesd  ue  to  the  robust  design  that  is  insen¬ 
sitive  to  received  SIR,  Doppler  shift,  size  of  the  antenna 
array,  and  the  computational  requirementsa  re  adaptable  to 
the  desired  applications.  The  algorithm  was  designed  with 
a  multiclass,  multilabel  capability  and  includes  an  error 
control  and  validation  process.  In  addition,  there  are  many 
limitations  of  standard  DOA  estimation  algorithms,  ESPRIT 
and  MUSIC,  that  do  not  exist  with  the  LS-SVM  DOA 
estimation  algorithm. 

The  LS-SVM  algorithm  for  DOA  estimation  assigns  DOA 
labels  to  each  eigenvector  in  the  signal  subspace.  By  re¬ 
peating  the  DDAG  cycle  for  each  eigenvector  the  multiclass 
algorithm  hast  he  capability  of  assigning  multiple  labels!  o 
the  input  signal.  Simulation  results  show  a  high  degree  of 
accuracy  and  prove  that  the  LS-SVM  DDAG  system  has  a 
wide  range  of  performance  capabilities.  The  results  show 
that  the  algorithm  is  accurate  for  a  large  range  of  DDAG 
performance  independent  of  DDAG  class  or  DOA  range  per 
class.  The  LS-SVM  DDAG  system  accurately  classifies  the 
DOAs  for  three  to  ten  classes  and  DOA  ranges  from  one 
degree  to  twenty  degrees. 
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ABSTRACT:  Neural  models  for  calculating  the 
bandwidth  of  electrically  thin  and  thick  rectangular 
microstrip  antennas,  based  on  the  multilayered 
perceptrons  and  the  radial  basis  function  networks, 
are  presented.  Thirteen  learning  algorithms,  the 
conjugate  gradient  of  Fletcher-Reeves,  Levenberg- 
Marquardt,  scaled  conjugate  gradient,  resilient 
backpropagation,  conjugate  gradient  of  Powell-Beale, 
conjugate  gradient  of  Polak-Ribiere,  bayesian 
regularization,  one-step  secant,  backpropagation  with 
adaptive  learning  rate,  Broyden-Fletcher-Goldfarb- 
Shanno,  backpropagation  with  momentum,  directed 
random  search  and  genetic  algorithm,  are  used  to 
train  the  multilayered  perceptrons.  The  radial  basis 
function  network  is  trained  by  the  extended  delta-bar- 
delta  algorithm.  The  bandwidth  results  obtained  by 
using  neural  models  are  in  very  good  agreement  with 
the  experimental  results  available  in  the  literature. 
When  the  performances  of  neural  models  are 
compared  with  each  other,  the  best  results  for  training 
and  test  were  obtained  from  the  multilayered 
perceptrons  trained  by  the  conjugate  gradient  of 
Powell-Beale  and  Broyden-Fletcher-Goldfarb-Shanno 
algorithms,  respectively. 

1.  INTRODUCTION 

Microstrip  antennas  (MSAs)  have  become  the 
favorite  choice  of  antenna  designers  because  they 
offer  the  attractive  features  of  low  profile,  light 
weight,  low  cost,  conformability  to  curved  surfaces, 
ease  of  manufacture,  and  compatibility  with 
integrated  circuit  technology  [1-18].  A  number  of 
methods  [1-36]  using  different  levels  of 
approximation  have  been  proposed  and  used  to 
compute  the  bandwidth  of  rectangular  MSA,  as  this  is 
one  of  the  most  popular  and  convenient  shapes.  These 
methods  can  generally  be  divided  into  two  groups: 
simple  analytical  methods  and  rigorous  numerical 
methods.  Simple  analytical  methods  can  give  a  good 
intuitive  explanation  of  antenna  radiation  properties. 
However,  these  methods  do  not  consider  rigorously 
the  effects  of  surface  waves.  Exact  mathematical 
formulations  in  rigorous  methods  involve  extensive 
numerical  procedures,  resulting  in  round-off  errors, 
and  may  also  need  final  experimental  adjustments  to 
the  theoretical  results.  These  methods  also  require 


high  performance  large-scale  computer  resources  and 
a  very  large  number  of  computations.  Furthermore, 
most  of  the  previous  theoretical  and  experimental 
work  has  been  carried  out  only  with  electrically  thin 
MSAs,  normally  of  the  order  of  h/Xj  S  0.02,  where  h 
is  the  thickness  of  the  dielectric  substrate  and  Xj  is  the 
wavelength  in  the  substrate.  Recent  interest  has 
developed  in  radiators  etched  on  electrically  thick 
substrates.  The  need  for  theoretical  and  experimental 
studies  of  MSAs  with  electrically-thick  substrates  is 
motivated  by  several  major  factors.  Among  these  is 
the  fact  that  MSAs  are  currently  being  considered  for 
use  in  millimetre-wave  systems.  The  substrates 
proposed  for  such  applications  often  have  high 
relative  dielectric  constants  and,  hence,  appear 
electrically  thick.  The  need  for  greater  bandwidth  is 
another  reason  for  studying  thick  substrate  MSAs. 
Consequently,  this  problem,  particularly  the 
bandwidth  aspect,  has  received  considerable 
attention. 

In  this  paper,  models  based  on  artificial  neural 
networks  (ANNs)  are  presented  for  the  bandwidth  of 
both  electrically  thin  and  thick  rectangular  MSAs. 
Ability  and  adaptability  to  learn,  generalizability, 
smaller  information  requirement,  fast  real-time 
operation,  and  ease  of  implementation  features  have 
made  ANNs  popular  in  the  last  few  years  [37-40]. 
Because  of  these  fascinating  features,  artificial  neural 
networks  in  this  article  are  used  to  model  the 
relationship  between  the  parameters  of  MSA  and  the 
measured  bandwidth  results. 

In  previous  works  [35,41-48],  we  also  successfully 
introduced  ANNs  to  compute  the  various  parameters 
of  the  triangular,  rectangular  and  circular  MSAs.  In 
reference  [35],  the  bandwidth  of  rectangular  MSAs 
has  been  computed  by  using  ANNs.  In  [35],  only  the 
multilayered  perceptrons  (MLPs)  were  used  as  the 
neural  network  architecture.  However,  in  this  paper, 
both  the  MLPs  and  the  radial  basis  function  networks 
(RBFNs)  are  used  for  calculating  the  bandwidth. 
Furthermore,  in  [35],  the  four  learning  algorithms,  the 
backpropagation  (BP)  [49],  the  delta-bar-delta  (DBD) 
[50],  the  quick  propagation  (QP)  [51],  and  the 
extended  delta-bar-delta  (EDBD)  [52],  are  used  to 
train  the  MLPs.  However,  in  this  paper,  thirteen 
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learning  algorithms,  conjugate  gradient  of  Fletcher- 
Reeves  (CGFR)  [53],  Levenberg-Marquardt  (LM) 
[54,55],  scaled  conjugate  gradient  (SCG)  [56], 
resilient  backpropagation  (RP)  [57],  Broyden- 
Fletcher-Goldfarb-Shanno  (BFGS)  [58],  conjugate 
gradient  of  Powell-Beale  (CGPB)  [59,60],  conjugate 
gradient  of  Polak-Ribiere  (CGPR)  [61],  bayesian 
regularization  (BR)  [62],  one-step  secant  (OSS)  [63], 
backpropagation  with  adaptive  learning  rate 
(BPALR)  [61],  backpropagation  with  momentum 
(BPM)  [61],  directed  random  search  (DRS)  [64]  and 
genetic  algorithm  (GA)  [65,66]  are  used  to  train  the 
MLPs.  The  radial  basis  function  network  is  trained  by 
extended  delta-bar-delta  (EDBD)  algorithm.  The 
main  aims  of  this  paper  are 

•  to  calculate  the  bandwidth  of  electrically  thin 
and  thick  rectangular  MSAs  by  using  the 
MLPs  and  RBFNs  architectures; 

•  to  train  the  MLPs  by  the  CGFR,  LM,  SCG, 
RP,  BFGS,  CGPB,  CGPR,  BR,  OSS, 
BPALR,  BPM,  DRS,  and  GA,  and  to  train  the 
RBFNs  by  the  EDBD  algorithm; 

•  to  compare  the  bandwidth  results  of  neural 
models  presented  in  this  paper  with  the 
results  of  the  conventional  methods  available 
in  the  literature; 

•  to  compare  also  the  bandwidth  results  of 
neural  models  presented  in  this  paper  with  the 
results  of  fuzzy  inference  systems  [36] 
trained  by  the  improved  tabu  search 
algorithm  (ITS A)  [67],  the  modified  tabu 
search  algorithm  (MTSA)  [68]  and  the 
classical  tabu  search  algorithm  (CTSA) 
[69,70],  and  \vith  the  results  of  the  neural 
models  [35]  trained  by  the  BP,  DBD,  QP,  and 
EDBD  algorithms; 

•  to  determine  the  most  appropriate  neural 
model  in  calculating  the  bandwidth  of 
rectangular  MSAs;  and 

•  to  show  the  superiority  of  artificial 
intelligence  techniques  such  as  neural 
networks  and  fuzzy  inference  systems  over 
the  conventional  methods. 

In  the  following  sections,  the  bandwidth  of  the 
MSAs,  the  ANNs,  the  MLPs  and  the  RBFNs  are 
described  briefly,  and  the  application  of  neural 
networks  to  the  calculation  of  the  bandwidth  of  a 
MSA  is  then  explained. 

2.  BANDWIDTH  OF  A  RECTANGULAR 
MICROSTRIP  ANTENNA 
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Figure  1.  Geometry  of  rectangular  microstrip 
antenna. 

where  s  is  voltage  standing  wave  ratio  (VSWR),  and 
Qt  is  the  total  quality  factor.  The  total  quality  factor, 
Qr,  can  be  written  as 


Qt 


- -F - -F - -F - 

Qr  Qc  Qd  QsJ 


(2) 


where  the  four  terms  represent  the  radiation  quality 
factor,  the  quality  factors  due  to  conductor  loss, 
dielectric  loss,  and  surface  wave. 

Bandwidth  was  defined  by  Pozar  [23]  as  the  half¬ 
power  width  of  the  equivalent  circuit  impedance 
response.  For  a  series-type  resonance,  this  bandwidth 
is 


BW  = 


2R 


dw 


(3) 


where  Z=R+jX  is  the  input  impedance  at  the  radian 
resonant  frequency  Wp  For  a  parallel-type  resonance, 
(3)  is  used  with  R  replaced  by  G  and  X  replaced  by  B, 
where  Y=G+jB  is  the  input  admittance  at  resonance. 
The  derivative  in  (3)  can  be  evaluated  by  calculating 
the  input  impedance  at  two  frequencies  near 
resonance  and  using  a  finite  difference 
approximation.  The  resonant  resistance,  R,  is  given 
by 


Figure  1  illustrates  a  rectangular  patch  of  width  fV 
and  length  L  over  a  ground  plane  with  a  substrate  of 
thickness  h  and  a  relative  dielectric  constant  Br-  The 
bandwidth  of  this  antenna  can  be  written  as  [1] 

BW  =  -^  (1) 

QtVs 


R=R,-FRd-FR,-FR,  (4) 

where  the  four  terms  represent  the  radiation 
resistance,  the  equivalent  resistance  of  the  dielectric 
loss,  the  equivalent  resistance  of  the  conductor  loss, 
and  surface  wave  radiation  resistance.  The  certain 
way  of  calculating  the  total  quality  factor  and  the 
resonant  resistance  of  both  electrically  thin  and  thick 
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rectangular  microstrip  patch  antennas  involves  the 
complicated  Green  function  methods  and  integral 
transformation  techniques.  These  methods  and 
techniques  suffer  from  a  lack  of  computational 
efficiency,  which  in  practice  can  restrict  their 
usefulness  because  of  high  computational  time  and 
costs. 


In  this  work,  a  new  technique  based  on  the  ANNs  for 
solving  this  problem  efficiently  is  presented.  First,  the 
antenna  parameters  related  to  the  bandwidth  are 
determined,  then  the  bandwidth  depending  on  these 
parameters  is  calculated  by  using  the  ANNs. 


The  feeding  method  or  position  is  not  considered  in 
calculating  the  bandwidth  because  the  feeding 
method  or  position  does  not  effect  the  intrinsic  patch 
bandwidth.  The  bandwidth  of  a  patch  is  significantly 
greater  than  that  of  a  printed  dipole,  at  least  over  the 
range  for  which  the  patch  actually  resonates 
(h<0.12Ao,  where  Xo  is  the  free  space  wavelength  at 
the  resonant  frequency  /).  This  fact  is  consistent  with 
the  antenna  gaiii/bandwidth  relation  to  antenna  size. 
Therefore,  the  effect  of  the  patch  width  (V  on  the 
bandwidth  of  rectangular  microstrip  antennas  must  be 
taken  into  consideration  in  the  bandwidth  calculation 
of  these  antennas.  From  the  results  of  the  methods 
available  in  the  literature  [1-36]  we  see  that  for  a 
given  frequency,  larger  bandwidth  is  possible  by 
choosing  a  thicker  substrate  and  a  wider  patch.  The 
results  also  indicate  that  a  lower  value  of  e,  results  in 
a  larger  bandwidth. 


It  is  clear  from  the  methods  and  formulas  presented 
by  [1-36]  that  only  three  parameters,  h/Aj,  fV,  and  the 
dielectric  loss  tangent  tanS,  are  needed  to  describe  the 
bandwidth.  The  wavelength  in  the  dielectric  substrate, 
Xj,  is  given  as 


^0  _  c 


(5) 


where  c  is  the  velocity  of  electromagnetic  waves  in 
free  space. 


3.  ARTIFICIAL  NEURAL  NETWORKS  (ANNs) 


ANNs  are  biologically  inspired  computer  programs 
designed  to  simulate  the  way  in  which  the  human 
brain  processes  information.  ANNs  gather  their 
knowledge  by  detecting  the  patterns  and  relationships 
in  data  and  learn  (or  are  trained)  through  experience, 
not  from  programming.  An  ANN  is  formed  from 
hundreds  of  single  units,  artificial  neurons  or 
processing  elements  connected  with  weights,  which 
constitute  the  neural  structure  and  are  organised  in 
layers.  The  power  of  neural  computations  comes  from 
weight  connection  in  a  network.  Each  neuron  has 
weighted  inputs,  summation  function,  transfer 


function  and  one  output.  The  behaviour  of  a  neural 
network  is  determined  by  the  transfer  functions  of  its 
neurons,  by  the  learning  rule,  and  by  the  architecture 
itself.  The  weights  are  the  adjustable  parameters  and, 
in  that  sense,  a  neural  network  is  a  parameterised 
system.  The  weighted  sum  of  the  inputs  constitutes 
the  activation  of  the  neuron.  The  activation  signal  is 
passed  through  a  transfer  function  to  produce  the 
output  of  a  neuron.  Transfer  function  introduces  non¬ 
linearity  to  the  network.  During  training,  the  inter¬ 
unit  connections  are  optimised  until  the  error  in 
predictions  is  minimised  and  the  network  reaches  the 
specified  level  of  aceuracy.  Onee  the  network  is 
trained,  new  unseen  input  information  is  entered  to 
the  network  to  ealculate  the  output  for  test.  ANN 
represents  a  promising  modelling  technique, 
especially  for  data  sets  having  non-linear 
relationships  that  are  frequently  encountered  in 
engineering.  In  terms  of  model  specification,  artificial 
neural  networks  require  no  knowledge  of  the  data 
source  but,  since  they  often  eontain  many  weights 
that  must  be  estimated,  they  require  large  training 
sets.  In  addition,  ANNs  ean  combine  and  incorporate 
both  literature-based  and  experimental  data  to  solve 
problems. 

There  are  many  types  of  neural  networks  for  various 
applications  available  in  the  literature  [37-40,71]. 
RBFNs  and  MLPs  are  examples  of  feed-forward 
networks  and  both  universal  approximators.  In  spite 
of  being  different  networks  in  several  important 
respects,  these  two  neural  network  architectures  are 
capable  of  accurately  mimicking  each  other  [40]. 

3.1.  Multilayered  Perceptrons  (MLPs) 

Multilayered  perceptrons  (MLPs)  [40,49]  are  the 
simplest  and  therefore  most  commonly  used  neural 
network  architectures.  They  have  been  adapted  for 
the  calculation  of  the  bandwidth  of  the  MSA.  MLPs 
can  be  trained  using  many  different  learning 
algorithms  [37-40,71].  In  this  paper,  MLPs  are 
trained  with  the  CGFR,  LM,  SCG,  RP,  BFGS, 
CGPB,  CGPR,  BR,  OSS,  BPALR,  BPM,  DRS,  and 
GA.  As  shown  in  Figure  2,  an  MLP  eonsists  of  three 
layers:  an  input  layer,  an  output  layer  and  an 
intermediate  or  hidden  layer.  Neurons  (indicated  in 
Figure  2  with  the  circle)  in  the  input  layer  only  act  as 
buffers  for  distributing  the  input  signals  *,•  to  neurons 
in  the  hidden  layer.  Each  neuron  j  in  the  hidden  layer 
sums  up  its  input  signals  x,-  after  weighting  them  with 
the  strengths  of  the  respective  connections  Wp  from 
the  input  layer  and  computes  its  output  yj  as  a 
function / of  the  sum,  viz., 

yj=f(IWjiXi)  (6) 

/ can  be  a  simple  threshold  function,  a  sigmoidal  or 
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Figure  2.  General  form  of  multilayered  perceptrons. 


hyperbolic  tangent  function.  The  output  of  neurons  in 
the  output  layer  is  computed  similarly. 


Training  a  network  consists  of  adjusting  weights  of 
the  network  using  the  different  learning  algorithms.  A 
learning  algorithm  gives  the  change  Awji(k)  in  the 
weight  of  a  connection  between  neurons  i  and  j  at 
time  k.  The  weights  are  then  updated  according  to  the 
following  formula 


Wji(k  +  l)  =  Wji(k)  +  Awji(k  +  l) 


(7) 


3.2.  Radial  Basis  Function  Networks  (RBFNs) 

An  alternative  network  architecture  to  the  MLP  is  the 
RBFN  [72-74].  A  network  with  an  internal 
representation  of  hidden  neurons,  radially  symmetric, 
is  named  as  a  RBFN.  The  topology  of  the  RBFN  is 
obviously  similar  to  that  of  the  three-layered  MLP, 
and  the  differences  lie  in  the  characteristics  of  the 
hidden  neurons.  The  structure  of  a  RBFN  is  shown  in 
Figure  3. 

The  construction  of  a  RBFN  in  its  most  basic  form 
involves  three  entirely  different  layers.  The  input 
layer  is  made  up  of  source  neurons.  The  second  layer 
is  a  hidden  layer  of  high  dimension  serving  a 
different  purpose  from  that  in  a  MLP.  This  layer 
consists  of  an  array  of  neurons.  Each  neuron  contains 
a  parameter  vector  called  a  centre.  The  neuron 
calculates  the  Euclidean  distance  between  the  centre 
and  the  network  input  vector,  and  passes  the  result 
through  a  non-linear  function.  The  output  layer  is 
essentially  a  set  of  linear  combiners  and  supplies  the 
response  of  the  network.  The  transformation  from 
input  layer  to  the  hidden  layer  is  non-linear,  whereas 
the  transformation  from  the  hidden  layer  to  the  output 
layer  is  linear. 


yi  yi  y„ 


ft  t 


The  output  of  an  hidden  layer  is  a  function  of  the 
distance  between  the  input  vector  and  the  stored 
centre  and  calculated  as 


Ok=||X-C,||  =  .Ji(Xi-C,J  (8) 

The  learning  consists  of  using  a  clustering  algorithm 
for  determining  the  cluster  centres  (C*)  and  a  nearest 
neighbour  heuristic  for  determining  the  cluster 
centres.  Linear  regression,  or  a  gradient  descent 
algorithm  is  used  to  determine  the  weights  from  the 
hidden  layer  to  the  output  layer.  In  this  work,  EDBD 
algorithm  is  used  to  train  the  weights  of  the  layer. 

4.  NEURAL  NETWORKS  FOR  BANDWIDTH 
COMPUTATION 

ANNs  have  been  adapted  for  the  calculation  of  the 
bandwidth  (BW)  of  electrically  thin  and  thick 
rectangular  microstrip  antennas.  MLPs  are  trained 
with  the  use  of  CGFR,  LM,  SCG,  RP,  BFGS,  CGPB, 
CGPR,  BR,  OSS,  BPALR,  BPM,  DRS,  and  GA 
algorithms.  RBFN  is  trained  by  using  EDBD 
algorithm.  For  the  neural  models,  the  inputs  are  h/Xj, 
W,  and  tan5,  and  the  output  is  the  measured 
bandwidth  BWme.  A  neural  model  used  in  calculating 
the  BW  is  shown  in  Figure  4. 

For  the  MLPs  trained  by  DRS  and  GA,  input  layer 
has  the  linear  transfer  function,  the  hidden  and  output 
layers  have  the  sigmoid  function.  For  the  MLPs 
trained  by  the  other  learning  algorithms,  the  input  and 
output  layers  have  the  linear  transfer  function  and  the 
hidden  layers  have  the  tangent  hyperbolic  function.  In 
the  RBFNs,  the  sigmoid  function  was  used  for  the 
output  layer.  Training  an  ANN  with  the  use  of  a 
learning  algorithm  to  compute  the  bandwidth 
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Figure  4.  Neural  model  for  bandwidth  computation. 


involves  presenting  it  sequentially  with  different  sets 
(h/Xj,  W,  tan6)  and  corresponding  measured  values 
BWme-  Differences  between  the  target  output  BWme 
and  the  actual  output  of  the  ANNs  are  evaluated  by  a 
learning  algorithm.  The  adaptation  is  carried  out  after 
the  presentation  of  each  set  (h/Xd,  W,  tanS)  until  the 
calculation  accuracy  of  the  network  is  deemed 
satisfactory  according  to  some  criterion  (for  example, 
when  the  error  between  BWme  and  the  actual  output 
for  all  the  training  set  falls  below  a  given  threshold) 
or  the  maximum  allowable  number  of  epochs  or 
generations  is  reached. 

The  training  and  test  data  sets  used  in  this  paper  have 
been  obtained  from  the  previous  experimental  works 
[33,34],  and  are  given  in  Table  1.  The  27  data  sets  in 
Table  1  were  used  to  train  the  networks.  6  test  data 
sets  which  are  marked  with  an  asterisk  in  Table  1 
were  used  for  test.  The  number  of  neurons  in  the 
hidden  layers  and  train  epochs  for  neural  models 
presented  here  are  given  in  Table  2.  10x7x8  in 
Table  2  means  that  the  number  of  neurons  was  10,  7, 
and  8  for  the  first,  second,  and  third  hidden  layers, 
respectively.  Initial  weights  of  the  neural  models 
were  set  up  randomly. 

5.  RESULTS  AND  CONCLUSIONS 

The  bandwidths  calculated  by  using  neural  models 
presented  in  this  paper  for  electrically  thin  and  thick 
rectangular  microstrip  patch  antennas  are  listed  in 
Table  3.  For  comparison,  the  results  obtained  by 
using  the  conventional  methods  [1,21,31-33],  and  the 
neural  models  presented  by  [35]  and  the  fuzzy 
inference  systems  [36]  are  given  in  Table  4.  EDBD, 
DBD,  BP,  QP,  ITSA,  CTSA,  and  MTSA  in  Table  4 
represent,  respectively,  the  bandwidths  calculated  by 
the  neural  models  [35]  trained  by  EDBD,  DBD,  BP, 
QP,  and  calculated  by  the  fuzzy  inference  systems 
[36]  trained  by  ITSA,  CTSA,  and  MTSA.  The  total 
absolute  errors  between  the  computed  and 
experimental  results  for  neural  models,  fuzzy 
inference  systems,  and  conventional  methods  are 
listed  in  Table  5  and  Table  6. 


Table  1.  Measured  bandwidth  results  and 
dimensions  for  electrically  thin  and  thick 
rectangular  microstrip  antennas.  _ _ 


Patch 

No 

W 

(mm) 

Measured 

h 

(mm) 

f,(GHz) 

hA^ 

tanS 

[33,34] 
BWme  (%) 

1 

0.17 

7.740 

0.0065 

8.50 

0.001 

1.07 

2 

0.79 

3.970 

0.0155 

20.00 

0.001 

2.20 

3 

0.79 

7.730 

0.0326 

10.63 

0.001 

3.85 

4 

0.79 

3.545 

0.0149 

20.74 

0.002 

1.95 

5 

1.27 

4.600 

0.0622 

9.10 

0.001 

2.05 

6 

1.57 

5.060 

0.0404 

17.20 

0.001 

5.10 

7 

1.57 

4.805 

0.0384 

18.10 

0.001 

4.90* 

8 

1.63 

6.560 

0.0569 

12.70 

0.002 

6,80 

9 

1.63 

5.600 

0.0486 

15.00 

0.002 

5.70 

10 

2.00 

6.200 

0.0660 

13.37 

0.002 

7.70* 

II 

2.42 

7.050 

0.0908 

11.20 

0.002 

10,90 

12 

2.52 

5.800 

0.0778 

14.03 

0.002 

9.30 

13 

3.00 

5.270 

0.0833 

15.30 

0.002 

10.00 

14 

3.00 

7.990 

0.1263 

9.05 

0.002 

16.00* 

15 

3.00 

6.570 

0.1039 

11.70 

0.002 

13.60 

16 

4.76 

5.100 

0.1292 

13.75 

0,002 

15.90 

17 

3.30 

8.000 

0.1405 

7.76 

0.002 

17.50 

18 

4.00 

7.134 

0.1519 

7.90 

0.002 

18.20* 

19 

4.50 

6.070 

0.1454 

9.87 

0.002 

17,90 

20 

4.76 

5.820 

0.1475 

10.00 

0.002 

18.00 

21 

4.76 

6.380 

0.1617 

8.14 

0.002 

19.00 

22 

5.50 

5.990 

0.1754 

7.90 

0.002 

20.00 

23 

6.26 

4.660 

0.1553 

12.00 

0.002 

18.70 

24 

8.45 

4.600 

0.2091 

7.83 

0.002 

20.90 

25 

9.52 

3.580 

0.1814 

12.56 

0.002 

20.00 

26 

9.52 

3.980 

0.2017 

9.74 

0.002 

20.60 

27 

9.52 

3.900 

0.1976 

10.20 

0.002 

20.30* 

28 

10.00 

3.980 

0.2119 

8.83 

0.002 

20.90 

29 

11.00 

3.900 

0.2284 

7.77 

0.002 

21.96 

30 

12.00 

3.470 

0.2216 

9.20 

0.002 

21.50 

31 

12.81 

3.200 

0.2182 

10.30 

0.002 

21.60 

32 

12.81 

2.980 

0.2032 

12.65 

0.002 

20.40 

33 

12.81 

3.150 

0.2148 

10.80 

0.002 

21.20* 

*Test  data  sets 


When  the  performances  of  neural  models  presented  in 
this  paper  and  in  [35]  are  compared  with  each  other, 
the  best  results  for  training  and  test  were  obtained 
from  the  MLP  network  trained  by  the  CGPB  and 
BFGS,  respectively,  as  shown  in  Table  5.  However, 
among  the  neural  models,  the  highest  accuracy  in  the 
total  absolute  errors  was  achieved  with  the  CGFR 
algorithm.  When  the  two  heuristic  approaches  were 
compared  with  each  other,  the  results  of  DRS  were 
found  better  than  those  of  the  GA.  It  is  also  clear 
from  Table  5  that  in  most  cases  the  results  of  neural 
models  presented  in  this  paper  are  better  than  those  of 
the  neural  models  presented  by  [35]  and  that  the  best 
result  in  the  total  absolute  errors  is  obtained  from  the 
fuzzy  inference  systems  trained  by  ITSA.  However, 
the  train  absolute  error  of  the  fuzzy  inference  systems 
trained  by  ITSA  is  larger  than  that  of  the  MLPs 
trained  by  CGFR,  LM,  SCG,  CGPB,  and  CGPR 
algorithms. 
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Table  2.  The  ANN  configurations  and  the  number  of 
train  epochs  for  neural  models  presented  in  this  paper. 


ANN  Architectures/ 
Algorithms 

The  number  of 
neurons  in  the 
hidden  layers 

The  number 
of  train 
epochs 

MLPs 

CGFR 

10x7x8 

2  500 

LM 

6x3 

201 

SCG 

11  x8x7 

1  200 

RP 

12x  10 

6  500 

BFGS 

10x5 

700 

CGPB 

7x7x4 

1  500 

CGPR 

7x7x4 

1  500 

BR 

3x4x3 

290 

OSS 

10  X  8  X  8 

2  500 

BPALR 

45  X  35  X  35 

2  500 

BPM 

45  X  35  X  35 

5  000 

DRS 

12x6 

740 

GA 

20x25 

1  850 

RBFN 

EDBD 

20x6 

185  200 

It  can  be  clearly  seen  from  Tables  4  and  6  that  the 
conventional  methods  give  comparable  results-some 
cases  are  in  very  good  agreement  with  measurements, 
and  others  are  far  off.  When  the  results  of  neural 
models  and  fuzzy  inference  systems  are  compared 
with  the  results  of  the  conventional  methods,  the 
results  of  all  neural  models  and  fuzzy  inference 
systems  are  better  than  those  predicted  by  the 
conventional  methods.  The  very  good  agreement 
between  the  measured  bandwidth  values  and  the 
computed  bandwidth  values  of  neural  models  and 
fiizzy  inference  systems  supports  the  validity  of  the 
artificial  intelligence  techniques  and  also  illustrates 
the  superiority  of  artificial  intelligence  techniques 
over  the  conventional  methods. 

A  distinct  advantage  of  neural  computation  is  that, 
after  proper  training,  a  neural  network  completely 
bypasses  the  repeated  use  of  complex  iterative 
processes  for  new  cases  presented  to  it.  For 
engineering  applications,  the  simple  models  are  very 
usable.  Thus  the  neural  models  given  in  this  work  can 
also  be  used  for  many  engineering  applications  and 
purposes. 
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Table  3.  Comparison  of  measured  and  calculated  bandwidths  obtained  by  using  neural  models  presented  in  this 


paper  for  electrically  thin  and  thick  rectangular  microstrip  antennas. 


Measured 

Present  Neural  Models 

Patch 

BWme(%) 

MLP 

RBFN 

[33,34] 

CGFR 

LM 

SCO 

RP 

BFGS 

CGPB 

CGPR 

BR 

OSS 

BPALR 

BPM 

DRS 

GA 

EDBD 

1 

1.070 

1.069 

1.071 

1.071 

1.070 

1.070 

1.070 

1.070 

1.070 

1.067 

1.071 

1.068 

1.400 

1.573 

1.048 

2 

2.200 

2.199 

2.200 

2.200 

2.201 

2.202 

2.200 

2.200 

2.200 

2.203 

2.200 

2.201 

2.182 

2.620 

2.292 

3 

3.850 

3.850 

3.850 

3.850 

3,850 

3.850 

3.851 

3.850 

3.850 

3.853 

3.837 

3.840 

3.336 

3.288 

3.849 

4 

1.950 

1.949 

1.950 

1.949 

1.950 

1.952 

1.950 

1.950 

1.950 

1.948 

1.945 

1.949 

1.951 

1.943 

1.899 

5 

2.050 

2.050 

2,050 

2,050 

2.051 

2.048 

2.050 

2.050 

2.050 

2.049 

2.061 

2.062 

2.210 

2.120 

2.077 

6 

5.100 

5.101 

5.100 

5.100 

5.099 

5.100 

5.099 

5.100 

5.100 

5.100 

5.097 

5.100 

5.223 

4.816 

5.024 

7 

4.900* 

4.560 

4.900 

4.766 

4.922 

4.764 

4.016 

4.011 

5.175 

4.137 

4.233 

4.300 

4.571 

4.506 

4.437 

8 

6.800 

6.800 

6.800 

6.800 

6.800 

6.798 

6.801 

6.800 

6.799 

6.811 

6.790 

6,801 

6.754 

7.076 

6.744 

9 

5.700 

5.699 

5.700 

5.700 

5.700 

5.700 

5.702 

5.700 

5.701 

5.702 

5.694 

5.701 

5,632 

5.470 

5.806 

10 

7.700* 

7.811 

7.763 

7.862 

8.132 

7.639 

7.719 

7.691 

7.869 

7.760 

7.705 

7.536 

7.891 

8.061 

7.968 

11 

10.900 

10.899 

10.901 

10.903 

10.900 

10.926 

10.900 

10.900 

10.900 

10.910 

10.902 

10.906 

11.285 

11.250 

11.080 

12 

9.300 

9.299 

9.299 

9.305 

9.299 

9.301 

9.300 

9.301 

9.302 

9.271 

9.301 

9.298 

9.425 

9.451 

9.471 

13 

10.000 

10.001 

10.001 

9.999 

10.001 

9.999 

9.999 

9.999 

9.998 

10.016 

9.980 

9.997 

9.983 

9.864 

9.813 

14 

16.000* 

15.954 

15.918 

16.161 

15.995 

15.890 

16.063 

16.100 

16.337 

16.396 

15.982 

16.182 

15.924 

16.167 

15.940 

15 

13.600 

13.601 

13.599 

13.595 

13.605 

13.548 

13.598 

13.600 

13.600 

13.598 

13.576 

13.593 

13.169 

13.135 

13.225 

16 

15.900 

15.899 

15.901 

15.899 

15.902 

15.919 

15.902 

15.901 

15.905 

15.900 

15.905 

15.906 

16,003 

16.086 

16.106 

17 

17.500 

17.499 

17.504 

17.496 

17.493 

17.496 

17.499 

17.502 

17.501 

17.482 

17.494 

17.499 

17.284 

17.516 

17.417 

18 

18.200* 

18.345 

18.422 

18.217 

18.179 

18.365 

18.297 

18.300 

18.311 

18.395 

18.458 

18.537 

18.340 

18.394 

18.381 

19 

17.900 

17.877 

17.847 

17.853 

17.824 

17.850 

17.871 

17.869 

17.860 

17.864 

17.861 

17.890 

17.947 

17.883 

17.996 

20 

18,000 

18.023 

18.055 

18.048 

18.066 

18.051 

18.035 

18.034 

18.019 

18.049 

18.036 

18.010 

18.129 

18.046 

18.170 

21 

19.000 

19.004 

19.004 

19.013 

19.026 

19.016 

18.999 

18.999 
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Table  4.  Bandwidths  obtained  from  the  conventional  methods  and  artificial  intelligence  techniques  available  in 


the  literature  for  electrically  thin  and  thick  rectangular  microstrip  antennas. 
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- - . 
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Table  5.  Train,  test  and  total  absolute  errors  between  the  measured  and  calculated  bandwidhs  for 


Artificial  Intelligence 
Techniques 

Learning 

Algorithms 

Train  Absolute 
Errors  (%) 

Test  Absolute 
Errors  (%) 

Total  Absolute 
Errors  (%) 

CGFR 

0.199 

0.770 

0.969 

LM 

0.194 

0.815 

1.009 

SCO 

0.174 

1.017 

1.191 

RP 

0.421 

0.779 
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0.824 

0.726 
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0.136 

1.499 

1.635 

MLP 
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1.048 
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GA 
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Fuzzy  Interfe 
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[36] 
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1.620 
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Neural  Models 
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2.315 

MLP 

DBD 

2.267 

0.862 

3.129 

in  the  Literature 

BP 

4.158 

0.804 

4.962 

[35] 

QP 

4.921 

0.895 

5.816 
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Table  6.  The  total  absolute  errors  between  the 
measured  and  calculated  bandwidths  for  the 
conventional  methods  in  the  literature. _ 
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Abstract 

A  new  method  for  the  robust  estimation  of  target 
orientation  using  measured  radar  cross  section  is 
proposed.  The  method  is  based  on  a  Generalized 
Regression  Neural  Network  (GRNN)  scheme.  The 
network  is  trained  by  the  FFT  modulus  of  bistatic 
radar  cross  section  data  sampled  at  the  receiver 
positions.  The  target  value  to  be  trained  is  the  angle 
between  a  defined  target  orientation  and  the  incident 
wave.  Results  based  on  actual  measurements  are 
presented. 

INTRODUCTION 

Accurate  estimation  of  target  orientation  is  essential 
in  range  profiling  schemes  [1-4].  In  such  cases,  the 
knowledge  of  target  orientation  can  yield  information 
about  the  target-structure.  The  range  profile  itself, 
however,  is  quite  sensitive  to  variations  in  target 
orientation  and  cannot  be  the  basis  for  such 
estimation.  A  detailed  tracking  of  object  orientation  is 
therefore  necessary. 

Attempts  have  been  made  to  use  artificial  neural 
networks  (ANNs)  for  solving  the  inverse  problem. 
However,  the  proposed  methods  have  not  been  able 
to  exploit  the  fundamental  advantages  of  neural 
systems,  whichare  their  speed  and  robustness.  In 
many  instances,  the  problem  formulation  was  fitted 
into  previously  developed  algorithms  for  network 
training  [5,  6].  Nevertheless,  successful  methods 
were  developed  for  cases  where  a  priori  knowledge 
of  the  target  geometry  is  available  [7].  Neural 
networks  have  proven  to  do  well  in  target 
classification  area.  A  spectral  approach  to  radar  target 
classification  using  ANNs  was  proposed  in  [8]. 

The  Generalized  Regression  Neural  Network 
(GRNN)  [9]  is  among  radial  basis  networks  and  has 
found  applications  in  regression  and  function 


estimation  processes.  It  has  been  shown  that  given  a 
sufficient  number  of  neurons  in  the  hidden  layer,  a 
GRNN  can  approximate  a  continuous  function  to  an 
arbitrary  precision  [10]. 

In  this  paper,  the  orientation  of  a  cylindrical 
conducting  target  is  estimated  with  a  GRNN  network 
using  radar  cross  section  data.  The  definition  of  the 
problem  is  shown  in  Figure  1,  where  a  target  is 
illuminated  by  a  number  of  transmitters/receivers  at 
different  angles  of  incidence.  The  orientation  angle  is 
defined  as  the  angle  between  a  preferred 
direction  specified  on  the  target  geometry  and  the 
incident  wave.  The  task  is  to  find  the  orientation 
angle  by  using  a  number  of  bistatic  radar 
measurements. 

THE  FORWARD  PROBLEM 

Consider  a  perfectly  conducting  cylinder  of  arbitrary 
cross  section  shape,  as  shown  in  Figure  2,  illuminated 
by  a  plane  wave  in  free  space.  The  cylindrical 

contour  is  denoted  by  C.  For  the  TM  polarization, 
the  electric  field  integral  equation  (EFIE)  is  given  by 

E[  (p)  =  ^  Ip  -  p'W  (1) 

^  c 

where  p  and  p'  are  the  field  and  source  points, 
respectively,  and  is  the  zeroth  order  Hankel 

function  of  the  second  kind. 

The  above  integral  equations  are  solved  numerically 
by  the  method  of  moments.  Once  the  induced  current 
is  calculated,  the  scattering  echo  width  is  given  by 
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THE  GRNN 


The  Generalized  Regression  Neural  Network  belongs 
to  the  family  of  radial  basis  neural  networks.  Radial 
basis  networks  require  more  neurons  than  standard 
feed-forward  backpropagation  networks,  but  they  can 
often  be  designed  in  a  fraction  of  the  time  it  takes  to 
train  standard  feed-forward  networks.  They  work  best 
when  many  training  vectors  are  available. 


Radial  basis  networks  were  previously  used  in  field 
estimation  processes.  It  is  shown  that  given  a 
sufficient  number  of  neurons  in  the  hidden  layer,  a 
GRNN  can  approximate  a  continuous  function  to  an 
arbitrary  precision.  The  GRNN  is  a  memory  based 
network,  which  provides  estimates  of  continuous 
variables  and  converges  tot  he  underlying  optimal 
linear  or  nonlinear  regression  surface.  The  network 
requires  no  prior  knowledge  of  a  specific  functional 
from  between  input  and  output.  The  appropriate  form 
is  expressed  as  a  probability  density  function  that  is 
empirically  determined  from  observed  data  using 
Parzen  window  estimation  [11].  For  this  reason,  it 
works  very  well  with  sparse  data.  The  network  is  a 
one-pass  learning  algorithm  and  can  generalize  from 
examples  as  soon  as  they  are  stored.  The  structure  of 
the  Network  is  depicted  in  Figure  3. 


Let  jc  be  a  vector  random  variable  of  dimension  p, 
and  y  be  a  scalar  random  variable.  Then  f(x,y)  is  the 
joint  continuous  probability  density  function  of  a;  and 
y.  Let  X  be  a  particular  value  of  the  random  variable 
X.  The  conditional  mean  ofy  given  AT  (regression  ofy 
on  X)  is  given  by 

tc 

\yf  {'^,y)dy 

^f{X,y)dy 


But  the  probability  density  function  f(x,y)  is  not  known  a 
priori.  It  may  be  estimated  from  a  sample  of  observations  of 
X  andy  as  proposed  by  Parzen  as  [9] 


£y'exp 


E[y\X]  =  - 


(T 


£exp 


cr 


(4) 


where 


C,^±\X,-X‘\  (5) 

is  the  city  block  distance.  Note  that  in  (4),  tT  is  the 
spread  parameter  of  the  density  estimator,  and  should 


not  be  confused  with  the  echo-width  defined  in  (2). 
The  estimate  (4)  can  be  considered  as  a  weighted 
average  of  all  the  observed  values,  each  being 
weighted  exponentially  aecording  to  its  distance  from 
X.  It  can  be  shown  that  this  density  estimator  used  in 
estimating  (3)  asymptotically  converges  to  the 
underlying  probability  density  function  f(x,y)  at  all 
points  (x,y)  at  which  the  density  function  is 
continuous,  provided  that  the  spread  parameter 
(7  =  is  chosen  as  a  decreasing  function  of  n. 

When  a  is  large,  the  estimated  density  function 
approaches  a  multivariate  Gaussian  function.  For 

intermediate  values  of  (T ,  all  values  of  Y'  are  taken 
into  account,  but  those  corresponding  to  points  closer 
to  X  are  weighted  heavier.  The  estimate  cannot 
converge  to  poor  solutions  corresponding  to  local 
minima  of  the  error  criterion. 


TRAINING 

The  sensors  are  assumed  to  be  fixed  with  respect  to 
the  wave  direction.  The  target  is  impinged  upon  by 
transverse  magnetic  plane  waves  from  different 
directions.  To  prepare  the  training  data,  a  total  of  10 
equally  spaced  receivers  are  used. 

It  was  found  that  the  FFT  modulus  of  the  echo-width 
patterns  sampled  at  the  receiver  positions  for  angles 
of  incidence  provided  better  generalization 
capabilities  for  the  network,  compared  with  the  case 
when  the  network  was  trained  with  the  echo-width 
vector  (amplitude  and  phase).  Simulated  bistatic 
echo-width  was  used  for  the  training  of  the  network. 
The  forward  problem  was  solved  using  the  method  of 
moments.  These  calculations  formed  a  10  element 
input  vector  at  every  receiver  for  the  network. 

Some  noisy  data  created  by  displacing  the  receivers, 
were  added  to  the  training  data  set  to  let  the  system 
face  small  sensor  position  drifts.  These  vectors  were 
used  in  training  the  network.  The  spread  parameter 
a  was  manipulated  so  that  the  network  angular 
estimation  was  sufficiently  robust.  The  target  value  to 
be  trained  was  the  angle  between  the  target 
orientation  and  the  incident  wave. 


RESULTS 

In  this  section,  the  performance  of  the  network  will 
be  examined. 

The  network  was  trained  using  the  data  described  in 
the  previous  section  for  the  triangular  shaped  target 
shown  in  Figure  1 .  To  check  the  generalization  power 


KABIRI,  et  al.:  NEURAL  NETWORKS  FOR  ESTIMATION  OF  TARGET  ORIENTATION 


123 


of  the  network,  a  set  of  40  new  input  data  was 
produced,  this  time  by  angles  not  previously 
encountered  by  the  network  with  all  other  parameters 
held  unchanged.  Figure  4  shows  the  cumulative  error 
for  estimating  the  target  orientation  for  trained  and 
untrained  data.  The  network  clearly  displays  a  very 
good  level  of  generalization  of  its  estimates  based  on 
the  training  data  set  and  more  than  99%  of  the  cases 
have  less  than  one-degree  error. 

The  triangular  cylindrical  target  shown  in  Figure  5 
was  considered  next.  This  is  the  Ipswich  target  IPS- 
009.  The  network  was  trained  using  the  simulated 
echo-width  data.  Then  the  network  was  presented 
with  the  RCS  data  collected  at  Ipswich  bistatic  RCS 
range.  Only  the  bistatic  echo-width  data  on  180 
degree  range  was  used  (that  is,  one  side  of  the 
cylinder  was  examined).  The  performance  of  the 
network  in  estimating  the  orientation  of  the  target  is 
shown  in  Figure  6.  It  is  observed  that  the  error  is  less 
than  one  degree  in  more  than  98%  of  the  cases. 

Next,  the  performance  of  the  network  was  examined 
for  the  case  when  the  target  size  is  not  exactly  known 
a  priori,  but  rather  the  geometry  of  its  shape  is 
known.  The  system  was  tested  in  facing  an  elliptical 
cylinder  when  the  electrical  size  of  the  cross-section 
was  rescaled  from  -7%  to  +5%.  The  cumulative  error 
is  shown  in  Figure  7  for  three  different  frequency 
scaling  factors  in  the  above  range. 

CONCLUDING  REMARKS 

The  problem  of  estimation  of  two-dimensional 
conducting  target  orientation  was  efficiently  handled 
by  a  Generalized  Regression  Neural  Network.  The 
training  data  set  consisted  of  the  calculated  bistatic 
echo-width  data  when  the  target  was  exposed  by  an 
incident  single  frequency  TM  plane  wave.  The 
performanee  of  the  network  does  not  change  if  the 
frequency  of  the  plane  wave  is  altered.  Currently,  The 
network  performance  against  sensor  misplacements, 
sensor  noise  (correlated  and  uncorrelated),  are  under 
study. 

It  is  believed  that  time  domain  schemes  such  as  range 
profiling  techniques  can  utilize  this  method  to 
overcome  difficulties  in  estimating  the  orientation  of 
the  target. 
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Figure  1-  Problem  set-up. 
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Figure  3-  The  structure  of  GRNN. 
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Figure  4-  The  GRNN  estimates  the  orientation  of  the  target  shown 
in  Figure  1  by  the  concept  of  generalization. 


Figure  5-  A  triangular  cylinder  of  sides  10.5cm  x  4.92cm  x  10.5cm  illuminated 
by  a  10  GHz  TM^  plane  wave.  The  height  of  the  cylinder  is  40.8  cm. 


Percent  of  Cases 
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Figure  6-  Error  diagram  for  the  network  response  for  the  target  shown  in  Figure  5. 
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Application  of  Two-Dimensional  AWE  Algorithm 
in  Training  Multi-Dimensional  Neural  Network  Model 
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ABSTRACT  1  INTRODUCTION 

Artificial  neural  network  (ANN)  plays  very 

important  role  in  microwave  engineering.  Artificial  neural  networks  (ANNs)  have  emerged 


Training  a  neural  network  model  is  the  key  of 
neural  network  technique.  The  conventional 
methods  for  training,  such  as  method  of  moment 
(MoM),  are  time-consuming  when  the  training 
parameters  are  a  bit  more.  In  order  to  aid  the 
training  process  by  reducing  the  amount  of  costly 
and  time-consuming  sampling  cycles,  a  lot  of 
algorithms  have  been  developed,  such  as 
asymptotic  waveform  evaluation  (AWE).  In  this 
paper,  MoM  in  conjunction  with  the 
two-dimensional  AWE  is  applied  to  accelerate  the 
process  of  training  the  neural  network  model 
based  on  the  input  impedance  response  on 
frequency  and  that  on  other  parameters  of  a 
microstrip  antenna.  In  AWE  method,  the 
derivatives  of  Green's  function  are  required.  A 
closed  form  of  microstrip  Green 's  function  is  used 
for  this  requirement.  Then,  the  derivative  matrices 
respect  to  both  frequency  and  permittivity  can  be 
obtained  from  the  original  matrix.  With  these 
matrices  in  hand,  coefficients  of  the 
two-dimensional  Fade  polynomial  can  be 
obtained.  So  the  sampling  data  for  training 
neural  network  model  can  be  obtained  and  the 
process  of  training  neural  net  model  can  be 
completed  quickly  and  accurately.  Numerical 
results  demonstrate  the  efficiency  of  this 
technique. 

KEY  TERMS 

AWE,  neural  network,  microstrip  antennas 


as  a  powerful  technique  for  modeling  general 
input/output  relationships.  ANNs  provide 
electromagnetically  trained  ANN  (EM-ANN) 
models  for  use  in  CAD  ofRR/microwave  circuits, 
antennas,  and  systems  [1].  The  training  is  the 
most  important  step  in  the  development  of  ANNs. 
The  actual  training  process  involves  algorithms 
for  finding  values  of  weights  associated  with 
various  neurons.  This  process  can  be  viewed  as  an 
optimization  one.  Various  well-known 
optimization  techniques,  such  as  genetic 
algorithms  and  so  on,  can  be  used  for  this  purpose. 
This  process  is  quite  time-consuming.  For 
example,  to  train  an  ANN  which  is  available  in  a 
wide  frequency  band,  the  computation  should  be 
carried  out  repeatedly  at  different  frequencies.  To 
overcome  this  difficulty,  the  space-mapping  (SM) 
technique  has  been  introduced  [2].  This  technique 
establishes  a  mathematical  link  between  the 
coarse  and  the  fine  models  and  directs  the  bulk  of 
the  CPU-intensive  computation  to  the  coarse 
model,  while  preserving  the  accuracy  offered  by 
the  fine  model.  Alternatively,  the  asymptotic 
waveform  evaluation  (AWE)  has  also  been 
applied  in  finite  difference  solution  [3-6].  This 
technique  extrapolates  the  data  from  one  point  to 
a  certain  range  based  on  the  value  and  the  high 
order  derivatives  at  this  point.  From  this  concept, 
it  is  seen  that  this  technique  is  computationally 
efficient  due  to  involving  the  analytical 
relationships  and  is  available  to  the  cases  where 
the  derivatives  may  be  obtained.  AW^  requires 
the  derivatives  of  Green’s  functions,  so  it  is  often 
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used  for  free-space  problems  [7-8].  In  this  paper, 
the  2-D  AWE  has  been  developed  to  extrapolate 
the  responses  over  frequency  and  permittivity 
simultaneously  to  characterize  microstrip 
antennas,  so  the  response  over  certain  frequency 
and  permittivity  ranges  can  be  extrapolated  from 
single  point  accurately  and  quickly.  To  check  the 
validity  of  this  method,  the  analysis  of  microstrip 
patch  antenna  is  chosen  as  an  example  by  using 
method  of  moments  (MoM).  The  variables  in  the 
model  are  frequency,  relative  permittivity, 
position  of  feed  line  and  the  dimension  of  the 
patch.  In  the  training  process,  two-dimensional 
AWE  is  responsible  for  providing  the  response  of 
both  frequency  and  relative  permittivity 
simultaneously  within  certain  range. 

2  FORMULAS 


^n-i  ^ _ _ 

(2.3) 

Where  kg  is  the  wave  number  on  the  expansion 
point,  a„„  denote  the  unknown  coefficients,  and 
PxQ  denotes  the  total  number  of  such 
coefficients. 

In  order  to  get  the  coefficients  a„„,  the  derivatives 
of  matrix  I  have  to  be  generated.  A  closed  form 
Green’s  function  Gl'\p)  that  is  easy  to  get 
derivatives  is  used  [10]. 


.-Ap  _  1 

Kp 


(2.4) 


2.1  Two  dimensional  AWE  method  [9] 


MoM  with  the  substrate  Green’s  function  usually 
results  in  a  matrix  equation  in  the  following  form: 
Z{k,s,)l{k,s,)  =  V{k,s,)  (2.1) 

Where  Z  is  a  square  matrix  and  only  can  be 
determined  by  the  object  analyzed,  7  is  an 
unknown  vector  of  the  induced  currents  on  the 
patch,  F  is  a  known  vector  associated  with  the 
source  or  excitation,  and  k  is  the  wave  number 
and  is  permittivity.  In  accordance  with  the 
AWE  method,  I{k,S^)  is  expanded  into  a 
two-dimensional  Taylor  series  to  obtain  the 
solutions  of  (2.1)  over  certain  frequency  and 
permittivity  ranges. 

I(.k,£ro)~^'^^nm(.k-kg)  -£,(,) 

fjssO  m=0 


(2.2) 
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nm 


=  Z‘ 


1 


■Cl 


Qm+ny 

d"kd"'s. 


n-\  m-\  1 

yy  a- - X 


L,{z)  =  Hg{z)-Y,{z)  (2.5) 

Where  a  is  defined  as  a  =  ~  1)  /  > 

Ho  and  Yq  are  the  Struve  and  Neumann  functions 
of  zero  order  and  p  =  .^x  -x'Y  +{y-  yY  ■  The 
Green’s  function  G^p{p)  in  (2.4)  is  valid 
subject  to  the  conditions  kgp(kgh/ £^)^  «1 
and  kgp{p,kgh)^  « 1 . 

The  Taylor  expansion  has  a  limited  bandwidth.  To 
obtain  a  wider  bandwidth,  we  represent 
I{k,  )  with  a  better  rational  Pad6  function: 

1=0  m=0 

(2.6) 

Where  Cqq  =  1  and  XF+YG+X-tF-t-Y-i-G+l 
=PQ-i-P+Q.  If  we  make  Y=Q  the  unknown 
coefficients  bjj  and  cy  can  be  calculated  by 
substituting  (2.2)  into  (2.6),  multiplying  (2.6)  by 
the  denominator  of  the  Pad6  expansion,  and 
matching  the  coefficients  of  the  equal  powers  of 
k-ko  and  f ,  -  .  This  leads  to  the  matrix 

equation  (2.7).  Where  n  is  from  1  to  Y.  If  we  solve 
equation  (2.7)  in  turn,  6y  and  Cy  can  be  obtained. 
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and  the  current  vector  /(^,£-,.)  can  be  obtained 
by  the  calculated  Pad6  model. 

2.2  Neural  networks  [11] 

Multilayer  perceptrons  (MLP)  are  the  most 
popular  type  of  neural  networks  in  use  today. 
Typically,  an  MLP  neural  network  consists  of  an 
input  layer,  one  or  more  hidden  layer,  and  an 
output  layer,  as  shown  in  Fig.  2. 1 .  The  top  layer  is 
the  ou^ut  layer  and  the  input  impedance  and 
other  scattering  parameters  can  be  outputted.  The 
bottom  layer  is  the  input  layer,  and  four 
parameters,  frequency,  relative  permittivity, 
position  of  feed  line  and  the  dimension  of  patch, 
are  inputted.  The  other  two  layers  are  hidden 
layers,  and  it  can  be  automatic  treated  in  the 
software  [12]. 
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2.3  Hybrid  of  AWE  and  ANN 
AWE  method  is  an  accurate  and  efficient 
technique  that  is  based  on  the  electromagnetic 
mechanism.  In  the  practical  application,  the 
response  varied  with  frequency  and  permittivity 


can  be  simultaneously  obtained  by  the 
two-dimensional  AWE  method.  In  AWE,  the 
differentiation  operates  on  the  Green’s  function 
which  does  not  involve  the  dimensions  of  the 
object  to  be  analyzed.  Therefore  it  is  not  available 
to  obtain  the  response  with  respect  to  the 
dimensions  through  AWE.  In  this  case,  the 
sampling  data  for  training  variables  respect  to  the 
dimension  of  the  microstrip  and  position  of  feed 
line  can  only  be  calculated  point  by  point.  Even  in 
this  case,  the  speed  of  training  is  about  one  or  two 
orders  faster  than  that  of  direct  training.  With  the 
two-dimensional  AWE  method  and  neural 
network  technique  in  hand,  we  can  accurately  and 
efficiently  construct  the  neural  network  model. 
The  flowchart  is  shown  in  Fig.  2.2. 
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As  the  neural  network  model  is  constructed,  the 
response  of  object  varied  with  each  parameter  can 
be  immediately  obtained.  This  trained  model  may 
be  used  in  the  optimization  of  microstrip 
structures  other  than  microstrip  antennas. 
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3  NUMERICAL  RESULTS  AND 
DISCUSSION 

The  example  is  a  microstrip  antenna  consisting  of 
a  conducting  patch  residing  on  a  dielectric 
substrate  having  thidcness  h=0.787mm  (Fig.  3.1). 
The  increments  of  frequency,  relative  permittivity, 
position  of  feed  line  H  and  dimension  of  patch  L 
are  O.OlGHz,  0.01,  0.1mm  and  0.01mm 
respectively.  In  order  to  get  the  response  under 
the  following  specification;  the  frequency  varies 
from  7.5GHz  to  8.4GHz;  the  permittivity  varies 
from  1.8  to  2.8;  the  dimension  L  varies  from 
12.0mm  to  13.0mm;  the  feed  line  position  H 
varies  from  8.79  to  0.79,  the  direct  method 
requires  7.5x1  o’ seconds  to  obtain  the  solution 
on  a  Personal  Computer  (1.2GHz  AMD  K7 
processor).  With  general  neural  network 
algorithm,  including  the  training  time,  to  obtain 
the  same  accuracy,  1.4x10^  seconds  are  need. 
But  with  hybrid  method,  only  1.2 x lO’seconds 
are  needed,  which  is  6.3x10“'  times  faster  than 
the  direct  method  and  1.2x10'  times  faster  than 
the  general  neural  network  method  (Table  3.1). 
The  training  process  of  the  soflware- 
“Neuralmodeler”  is  shown  in  Fig.  3.2  and  the 
final  error  is  less  than  0.01.  Figure  3.3  and  Figure 
3.4  show  the  real  and  imaginary  parts  of  the  input 
impedance  as  a  function  of  frequency,  relative 
permittivity,  dimension  L  and  position  H  by  using 
the  hybrid  method  of  the  two  dimensional  AWE 
method  and  neural  network  algorithm, 
respectively  (Due  to  difficulty  in  presenting  four 
dimensional  figure,  the  variables,  dimension  L 
and  position  H,  are  fixed).  When  this  four 
variables  neural  network  model  for  this  antenna 
patch  is  obtained,  consequently  the  optimizing 


and  designing  will  be  an  easy  job.  This  neural 
network  model  gives  the  complete 
characterization  of  the  microstrip  patch  antenna. 
Because  of  its  computational  efficiency,  it  is 
realizable  in  the  optimization  and  the  observation 
of  the  sensitivity  of  the  parameters,  such  as  the 
relative  permittivity  (Fig.  3.5).  In  this  example, 
only  four  variables  are  involved.  It  is  observed 
that  the  more  the  variables  to  be  optimized,  the 
more  the  reduction  of  the  computer  time. 

4  CONCLUSION 

The  AWE  algorithm  has  been  extended  from  one 
dimensional  to  two  dimensional  cases.  This 
extension  results  in  the  extrapolation  for  two 
variables  simultaneously.  Compared  to  the 
one-dimensional  AWE,  the  computer  time  is 
further  reduced  significantly.  The  hybrid  of  AWE 
and  ANN  makes  full  use  of  the  advantages  of 
both  algorithms.  Numerical  results  demonstrate 
the  efficiency  of  this  hybrid  scheme. 
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Table  3.1  The  compare  of  neural  network  methods  and  direct  method 
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1.4xl0*s 
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Frequency/GHz 


Figure  3.5  Sensitivity  ofthe  permittivity  (L=12.5mm,  H=8.79mm). 


5  REFERENCES 

[1]  V.  Devabhaktuni,  M.  C.  E.Yagoub,  Y.  Fang, 
J.  Xu  and  Q.  J.  Zhang,  “Neural  networks  for 
microwave  modeling:  model  development 
issues  and  nonlinear  modeling  technique,” 
International  Journal  of  RF  and  Microwave 
CAE,  (invited,  to  be  published). 

[2]  J.  W.  Sandler,  M.  A.  Ismail,  J.  E.  Rayas- 
Sanchez  and  Q.  J.Zhang,  “Neuromodeling  of 
microwave  circuits  exploiting  space 
mapping  technology,”  IEEE  Trans. 
Microwave  Theory  Tech.,  vol.  47,  pp. 
2417-2427,  December  1999. 

[3]  E.  K.  Miller,  “Model-based  parameter 
estimation  in  electromagnetic  Pt.l,”  IEEE 
Antennas  and  Propagation  Magazine,  vol. 
40,  no.  1,  pp.  40-52,  1998. 

[4]  E.  K.  Miller,  “Model-based  parameter 
estimation  in  electromagnetic  Pt.  2,”  IEEE 
Antennas  and  Propagation  Magazine,  vol. 
40,  no. 1,  pp.  51-65, 1998. 

[5]  E.  K.  Miller,  “Model-based  parameter 
estimation  in  electromagnetic  Pt.  3,”  IEEE 
Antennas  and  Propagation  Magazine,  vol. 


40,  no.  3,  pp.  49-66, 1998. 

[6]  M.  Li,  Q.  J.  Zhang  and  M.S.Nakhla,  “Finite 
difference  solution  of  EM  fields  by 
asymptotic  waveform  techniques,”  lEE 
Proceedings  on  Microwaves,  Antennas  and 
Propagation,  vol.  143,  pp.  512-520, 1996 

[7]  D.  Jiao  and  J.  M.  Jin,  “Asymptotic 
Waveform  Evaluation  for  Scattering  by  a 
Dispersive  Dielectric  Object,”  Microwave 
Opt.  Tech  Lett.,  vol.  24,  no.  4,  pp.  232-234, 
Feb.  2000 

[8]  C.  M.  Tong  and  W.  Hong,  “Fast  Calculation 
of  Wide  Angle  Mono-static  RCS  of 
Dielectric  Cylinders  Based  on  Asymptotic 
Wave  Evaluation  Technique,”  Chinese 
Journal  of  Radio  Science,  vol.  16,  no.  1,  pp. 
72-75,  2001 

[9]  Y.  Xiong,  D.  G  Fang  and  F.  Ling, 
“Two-Dimensional  AWE  Technique  in  Fast 
Calculation,”  ICMMT’2002,  pp.  393-396 

[10]  Ahmad  Hoorfar,  “Simple  Closed-Form 
Expressions  for  Microstrip  Green’s 
Functions  in  a  Magneto-dielectric 
Substrate,”  Mircrowave  Opt.  Tech.  Letters, 
vol.  8,  no.l,  pp.  33-36,  Jan.  1995 


XIONG,  et  al..:  AWE  ALGORITHM  IN  TRAINING  NEURAL  NETWORK  MODEL 


135 


[11]  Q.  J.  Zhang  and  K.C.Cupta,  Neural 
Networks  for  RF  and  Microwave  Design, 
Artech  House,  2000. 

[12]  Q.  J.  Zhang  and  his  research  team,  software 
“NeuralModeler",  Version  1.2.2  for 
windows  NT  4.0 

‘  Xiong  Ye  was  bom  in 
'  '  JiangSu  province  in  October 

\  1978.  He  received  the 

J  bachelor  degree  in  electro- 

I  magnetics  and  microwave 

.  "  technology  from  Nanjing 

University  of  Science  and 
Technology  (NUST)  in  1996. 

At  present  he  is  pursing  postgraduate  degree  in 
millimeter  wave  technique  laboratory  in  NUST, 
His  main  research  interests  are  radar  image 
analysis,  radar  cross  section  prediction  and  AWE 
technique. 


FANGDagai^  (SM’90,F’03) 
was  bom  in  Shanghai  in  June 
1937.  He  graduated  from 
graduate  school  of  Beijing 
Institute  of  Post  and 
Telecommunications  in  1966. 
From  1980  to  1982,  he  was 
Visiting  Scholar  at  Laval 
University  and  University  of 
Waterloo  both  in  Canada.  He  has  been  a 
Professor  of  Nanjing  University  of  Science  and 
Technology  since  1986.  He  was  qualified  as 
Ph.D.  student  supervisor  in  1990.  He  had  been 
Visiting  Professor  of  six  universities  in  Canada 
and  in  Hong  Kong  since  1987.  He  is  IEEE 
Fellow  and  on  the  editorial  board  of  IEEE  Trans. 
MTT  and  several  other  journals.  He  also  serves 
as  Vice  Editor  of  ‘The  Chinese  Journal  of 
Microwaves”.  He  has  co-edited  one  proceedings 
of  international  conference,  authored  or 
co-authored  one  book,  one  textbook,  two  book 
chapters  and  over  300  papers  including  more 
than  40  papers  published  in  the  international 
journals.  His  research  interests  include 
computational  electromagnetics,  microstrip 


antennas,  EM  scattering  and  microwave  imaging 
and  smart  antenna. 


Ru-Shan  Chen  received  the 
B.Sc.  degree  in  1987  and 
M.Sc.  degree  in  1990  both 
from  Dept,  of  Radio 
engineering.  Southeast 
University  and  PhD  from 
Dept  of  Electronic 
Engineering,  City  University  of  Hong  Kong  in 
2001.  He  joined  the  Dept,  of  Electrical 
Engineering,  Nanjing  University  of  Science  & 
Technology  (NUST),  where  he  was  a  Teaching 
Assistant  in  1990,  and  Lecture  later  in  1992. 
Since  Sept  1996,  he  was  a  Visiting  Scholar  in 
Dept,  of  Electronic  Engineering,  City  University 
of  Hong  Kong,  first  as  RA,  became  a  Senior  RA 
in  July  1997  and  a  RF  in  April  1998.  From  June 
to  Sept  1999,  He  was  also  a  Visiting  Scholar  at 
Montreal  University,  Canada  In  Sept  1999,  he 
was  promoted  to  be  a  full  Professor  and 
Associate  Director  of  Microwave  & 

Communication  Center  in  NUST  and  Associate 
Head,  Dept  of  Communication  Engineering, 
NUST  in  2001.  He  ever  received  the  1992 
third-class  science  and  technology  advance  prize, 
the  1993  third-class  science  and  technology 
advance  prize  and  the  1996  second-class  science 
and  technology  advance  prize  given  by  National 
Education  Committee  of  China,  the  1999 
first-class  science  and  technology  advance  prize 
given  by  Jiang  Su  Province,  the  2001 
second-class  science  and  technology  advance 
prize.  He  has  authored  or  co-authored  more  than 
120  papers  including  60  paper  in  international 
journals.  His  research  interests  mainly  include 
microwave/millimeter  wave  system,  measure¬ 
ments,  antenna,  circuit  and  computational 
electromagnetics. 


ACES  COPYRIGHT  FORM 


This  form  is  intended  for  original,  previously  unpublished  manuscripts  submitted  to  ACES  periodicals  and  conference  publications.  The  signed  form, 
appropriately  completed,  MUST  ACCOMPANY  any  paper  in  order  to  be  published  by  ACES.  PLEASE  READ  REVERSE  SIDE  OF  THIS  FORM  FOR 
FURTHER  DETAILS. 

TITLE  OF  PAPER:  RETURN  FORM  TO: 

Dr.  Atef  Z.  Elsherbeni 
University  of  Mississippi 
Dept,  of  Electrical  Engineering 

AUTHORS(S)  Anderson  Hall  Box  13 

PUBLICATION  TITLE/DATE:  University,MS  38677  USA 


PART  A  -  COPYRIGHT  TRANSFER  FORM 

(NOTE:  Company  or  other  forms  may  not  be  substituted  for  this  form.  U.S.  Govermnent  employees  whose  work  is  not  subject  to  copyright  may  so  certify 
by  signing  Part  B  below.  Authors  whose  work  is  subject  to  Crown  Copyright  may  sign  Part  C  overleaf). 

The  undersigned,  desiring  to  publish  the  above  paper  in  a  publication  of  ACES,  hereby  transfer  their  copyrights  in  the  above  paper  to  The  Applied 
Computational  Electromagnetics  Society  (ACES).  The  undersigned  hereby  represents  and  warrants  that  the  paper  is  original  and  that  he/she  is  the  author 
of  the  paper  or  otherwise  has  the  power  and  authority  to  make  and  execute  this  assignment. 

Returned  Rights:  In  return  for  these  rights,  ACES  hereby  grants  to  the  above  authors,  and  the  employers  for  whom  the  work  was  performed,  royalty-free 
permission  to: 

1.  Retain  all  proprietary  rights  other  than  copyright,  such  as  patent  rights. 

2.  Reuse  all  or  portions  of  the  above  paper  in  other  works. 

3.  Reproduce,  or  have  reproduced,  the  above  paper  for  the  author’s  personal  use  or  for  internal  company  use  provided  that  (a)  the  source  and 
ACES  copyright  are  indicated,  (b)  the  copies  are  not  used  in  a  way  that  implies  ACES  endorsement  of  a  product  or  service  of  an  employer,  and  (c)  the 
copies  per  se  are  not  offered  for  sale. 

4.  Make  limited  distribution  of  all  or  portions  of  the  above  paper  prior  to  publication. 

5.  In  the  case  of  work  performed  under  U.S.  Government  contract,  ACES  grants  the  U.S.  Government  royalty-free  permission  to  reproduce  all 
or  portions  of  the  above  paper,  and  to  authorize  others  to  do  so,  for  U.S.  Government  purposes  only. 

ACES  Obligations:  In  exercising  its  rights  under  copyright,  ACES  will  make  all  reasonable  efforts  to  act  in  the  interests  of  the  authors  and  employers  as 
well  as  in  its  own  interest.  In  particular,  ACES  REQUIRES  that: 

1.  The  consent  of  the  first-named  author  be  sought  as  a  condition  in  granting  re-publication  permission  to  others. 

2.  The  consent  of  the  undersigned  employer  be  obtained  as  a  condition  in  granting  permission  to  others  to  reuse  all  or  portions  of  the  paper  for  promotion 
or  marketing  purposes. 

In  the  event  the  above  paper  is  not  accepted  and  published  by  ACES  or  is  withdrawn  by  the  authoifs)  before  acceptance  by  ACES,  this  agreement  becomes 
null  and  void. 


AUTHORIZED  SIGNATURE  TITLE  (IF  NOT  AUTHOR) 

EMPLOYER  FOR  WHOM  WORK  WAS  PERFORMED  DATE  FORM  SIGNED 

Part  B  -  U.S.  GOVERNMENT  EMPLOYEE  C  ERTIFICATION 

(NOTE:  if  your  work  was  performed  under  Government  contract  but  you  are  not  a  Government  employee,  sign  transfer  form  above  and  see  item  5  under 
Returned  Rights). 

This  certifies  that  all  authors  of  the  above  paper  are  employees  of  the  U.S.  Government  and  performed  this  work  as  part  of  their  employment  and  that  the 
paper  is  therefor  not  subject  to  U.S.  copyright  protection. 


AUTHORIZED  SIGNATURE 


TITLE  (IF  NOT  AUTHOR) 


NAME  OF  GOVERNMENT  ORGANIZATION 


DATE  FORM  SIGNED 


PART  C  -  CROWN  COPYRIGHT 


(NOTE:  ACES  recognizes  and  will  honor  Crown  Copyright  as  it  does  U.S.  Copyright.  It  is  vmderstood  that,  in  asserting  Crown  Copyright,  ACES  in  no 
way  diminishes  its  rights  as  publisher.  Sign  only  if  ALL  authors  are  subject  to  Crown  Copyright). 

This  certifies  that  all  authors  of  the  above  Paper  are  subject  to  Crown  Copyright.  (Appropriate  documentation  and  instmctions  regarding  form  of  Crown 
Copyright  notice  may  be  attached). 


AUTHORIZED  SIGNATURE  TITLE  OF  SIGNEE 


NAME  OF  GOVERNMENT  BRANCH  DATE  FORM  SIGNED 


Information  to  Authors 


ACES  POLICY 

ACES  distributes  its  technical  publications  throughout  the  world,  and  it  may  be  necessary  to  translate  and  abstract  its  publications,  and  articles  contained 
therein,  for  inclusion  in  various  compendiums  and  similar  publications,  etc.  When  an  article  is  submitted  for  publication  by  ACES,  acceptance  of  the 
article  implies  that  ACES  has  the  rights  to  do  all  of  the  things  it  normally  does  with  such  an  article. 

In  connection  with  its  publishing  activities,  it  is  the  policy  of  ACES  to  own  the  eopyrights  in  its  technical  publications,  and  to  the  contributions  contained 
therein,  in  order  to  protect  the  interests  of  ACES,  its  authors  and  their  employers,  and  at  the  same  time  to  facilitate  the  appropriate  re-use  of  this  material 
by  others. 

The  new  United  States  copyright  law  requires  that  the  transfer  of  copyrights  in  each  contribution  from  the  author  to  ACES  be  confirmed  in  writing.  It  is 
therefore  necessary  that  you  execute  either  Part  A-Copyright  Transfer  Form  or  Part  B-U.S.  Government  Employee  Certification  or  Part  C-Crown 
Copyright  on  this  sheet  and  return  it  to  the  Managing  Editor  (or  person  who  supplied  this  sheet)  as  promptly  as  possible. 

CLEARANCE  OF  PAPERS 

ACES  must  of  necessity  assume  that  materials  presented  at  its  meetings  or  submitted  to  its  publications  is  properly  available  for  general  dissemination  to 
the  audiences  these  activities  are  organized  to  serve.  It  is  the  responsibility  of  the  authors,  not  ACES,  to  determine  whether  disclosure  of  their  material 
requires  the  prior  consent  of  other  parties  and  if  so,  to  obtain  it.  Furthermore,  ACES  must  assume  that,  if  an  author  uses  within  his/her  article  previously 
published  and/or  copyrighted  material  that  permission  has  been  obtained  for  such  use  and  that  any  required  credit  lines,  copyright  notices,  etc.  are  duly 
noted. 

AUTHOR/COMPANY  RIGHTS 

If  you  are  employed  and  you  prepared  your  paper  as  a  part  of  your  job,  the  rights  to  your  paper  initially  rest  with  your  employer.  In  that  case,  when  you 
sign  the  copyright  form,  we  assume  you  are  authorized  to  do  so  by  your  employer  and  that  your  employer  has  consented  to  all  of  the  terms  and  conditions 
of  this  form.  If  not,  it  should  be  signed  by  someone  so  authorized. 

NOTE  RE  RETURNED  RIGHTS:  Just  as  ACES  now  requires  a  signed  copyright  transfer  form  in  order  to  do  “business  as  usual”,  it  is  the 
intent  of  this  form  to  return  rights  to  the  author  and  employer  so  that  they  too  may  do  ‘Tjusiness  as  usual”.  If  further  clarification  is  required,  please 
contact:  The  Managing  Editor,  R.  W.  Adler,  Naval  Postgraduate  School,  Code  EC/AB,  Monterey,  CA,  93943,  USA  (408)656-2352. 

Please  note  that,  although  authors  are  permitted  to  re-use  all  or  portions  of  their  ACES  copyrighted  material  in  other  works,  this  does  not  include  granting 
third  party  requests  for  reprinting,  republishing,  or  other  types  of  re-use. 

JOINT  AUTHORSHIP 

For  jointly  authored  papers,  only  one  signature  is  required,  but  we  assume  all  authors  have  been  advised  and  have  consented  to  the  terms  of  this  form. 
U.S.  GOVERNMENT  EMPLOYEES 

Authors  who  are  U.S.  Government  employees  are  not  required  to  sign  the  Copyright  Transfer  Form  (Part  A),  but  any  co-authors  outside  the  Government 
are. 


Part  B  of  the  form  is  to  be  used  instead  of  Part  A  only  if  all  authors  are  U.S.  Government  employees  and  prepared  the  paper  as  part  of  their  job. 

NOTE  RE  GOVERNMENT  CONTRACT  WORK:  Authors  whose  work  was  performed  under  a  U.S.  Government  contract  but  who  are  not 
Government  employees  are  required  so  sign  Part  A-Copyright  Transfer  Form.  However,  item  5  of  the  form  returns  reproduction  rights  to  the  U.  S. 
Government  when  required,  even  though  ACES  copyright  policy  is  in  effect  with  respect  to  the  reuse  of  material  by  the  general  public. 

January  2002 


APPLIED  COMPUTATIONAL  ELECTROMAGNETICS  SOCIETY  JOURNAL 
littp;//aces.ee,oleiniss,edu 

INFORMATION  FOR  AUTHORS 


PUBLICATION  CRITERIA 

Each  paper  is  required  to  manifest  some  relation  to  applied 
computational  electromagnetics.  Papers  may  address 
general  issues  in  applied  computational  electromagnetics, 
or  they  may  focus  on  speciflc  applications,  techniques, 
codes,  or  computational  issues.  .While  the  following  list  is 
not  exhaustive,  each  paper  will  generally  relate  to  at  least  one 
of  these  areas: 

1.  Code  validation.  This  is  done  using  internal  checks  or 
experimental,  analytical  or  other  computatimial  data. 
Measured  data  of  ^tential  utility  to  code  validation 
efforts  will  also  be  considered  for  publication. 

2.  Code  performance  analysis.  This  usually  involves 
identification  of  numerical  accuracy  or  other  limitations, 
solution  convergence,  numerical  and  physical  modeling 
error,  and  parameter  tradeoff.  However,  it  is  also 
permissible  to  address  issues  such  as  ease-of-use,  set-up 
time,  run  time,  special  outputs,  or  other  special  features. 

3.  Computational  studies  of  basic  physics.  This  involves 
using  a  code,  algorithm,  or  computational  technique  to 
simulate  reality  in  such  a  way  that  better,  or  new 
physical  insight  or  understanding,  is  achieved. 

4  New  computational  techniques,  or  new  applications  for 
existing  computational  techniques  or  codes. 

5.  “Tricks  of  the  trade”  in  selecting  and  applying  codes 
and  techniques. 

6.  New  codes,  algorithms,  code  enhancement,  and  code 
fixes.  This  category  is  self-explanatory,  but  includes 
significant  changes  to  existing  codes,  such  as 
applicability  extensions,  algorithm  optimization,  problem 
correction,  limitation  removal,  or  other  performance 
improvement.  Note:  Code  (or  algorithm)  capability 
descriptions  are  not  acceptable,  unless  they  contain 
sufficient  technical  material  to  Justify  consideration. 

7.  Code  input/output  issues.  This  normally  involves 
innovations  in  input  (such  as  input  geometry 
standardization,  automatic  mesh  generation,  ,  or 
computer-aided  design)  or  in  output  (whether  it  be 
tabular,  graphical,  statistical,  Fourier-transformed,  or 
otherwise  signal-processed).  Material  dealing  with 
input/output  database  management,  output  interpretation, 
or  other  input/output  issues  will  also  be  considered  for 
publication. 

8.  Computer  hardware  issues.  This  is  the  category  for 
analysis  of  hardware  capabilities  and  limitations  of 
various  types  of  electromagnetics  computational 
requirements.  Vector  and  parallel  computotional 
techniques  and  Implementation  are  of  particular  interest. 


Applications  of  interest  include,  but  are  not  limited  to, 
antennas  (and  their  electromagnetic  enviroiunents),  networks, 
static  fields,  radar  cross  section,  shielding,  radiation  hazards, 
biological  •  effects,  electromagnetic  pulse  (BMP), 
electromagnetic  interference  (EMI),  electromagnetic 
compatibility  (EMC),  power  transmission,  charge  transport, 
dielectric,  magnetic  and  nonlinear  materials,  microwave 
components,  MEMS  technology,  MMIC  technology,  remote 
sensing  and  geometrical  and  physical  optics,  radar  and 
communications  systems,  fiber  optics,  plasmas,  particle 
accelerators,  generators  and  motors,  electromagnetic  wave 
propagation,  non-destructive  evaluation,  eddy  currents,  and 
inverse  scattering. 

Techniques  of  interest  include  frequency-domain  and  time- 
domain  techniques,  integral  equation  and  differential  equation 
techniques,  diffraction  theories,  physical  optics,  moment 
methods,  finite  differences  and  finite  element  techniques, 
modal  expansions,  pertarbation  methods,  and  hybrid  methods. 
This  list  is  not  exhaustive. 

A  unique  feature  of  the  Journal  is  the  publication  of 
unsuccessful  efforts  in  applied  computational 
electromagnetics.  Publication  of  such  material  provides  a 
means  to  discuss  problem  areas  in  electromagnetic  modeling. 
Material  representing  an  unsuccessful  application  or  negative 
results  in  computational  electromgnetics  will  be  considered 
for  publication  only  if  a  reasonable  expectation  of  success 
(and  a  reasonable  effort)  are  reflected.  Moreover,  such 
material  must  represent  a  problem  area  of  potential  interest  to 
the  ACES  membership. 

Where  possible  and  appropriate,  authors  are  required  to 
providt  statements  of  quantitative  accuracy  for  measured 
and/or  computed  data.  This  issue  is  discussed  in  “Accuracy 
&  Publication:  Requiring,  quantitative  accuracy  statements  to 
accompany  data,”  by  E.  K.  Miller,  ACES  Newsletter,  Vol.  9, 
No.  3,  pp.  23-29,  1994,  ISBN  1056-9170. 

EDITORIAL  REVIEW 

In  order  to  ensure  an  appropriate  level  of  quality  control, 
papers  are  peer  reviewed.  They  are  reviewed  both  for 
technical  correcmess  and  for  adherence  to  the  listed 
guidelines  regarding  information  content. 

JOURNAL  CAMERA-READY  SUBMISSION  DATES 

March  issue  deadline  8  January 

July  issue  deadline  20  May 

November  issue  deadline  20  September 

Uploading  an  acceptable  camera-ready  article  after  the 
deadlines  will  result  in  a  delay  in  publishing  this  article. 


STYLE  FOR  CAMERA-READY  COPY 

The  ACES  Journal  is  flexible,  within  reason,  in  regard  to 
style.  However,  certain  requirements  are  in  effect: 

1 .  The  paper  title  should  NOT  be  placed  on  a  separate  page. 
The  title,  authoi^s),  abstract,  and  (space  permitting) 
beginning  of  the  paper  itself  should  all  be  on  the  first 
page.  The  title,  author(s),  and  author  affiliations  should 
be  centered  (center-justified)  on  the  first  page. 

2.  An  abstract  is  REQUIRED.  The  abstract  should  be  a 
brief  summary  of  the  work  described  in  the  paper.  It 
should  state  the  computer  codes,  computational 
techniques,  and  applications  discussed  in  the  paper  (as 
applicable)  and  should  otherwise  be  usable  by  technical 
abstracting  and  indexing  services. 

3.  Either  British  English  or  American  English  spellings 
may  be  used,  provided  that  each  word  is  spelled 
consistently  throughout  the  paper. 

4.  Any  commonly-accepted  format  for  referencing  is 
permitted,  provided  that  internal  consistency  of  format  is 
maintained.  As  a  guideline  for  authore  who  have  no 
other  preference,  we  recommend  that  references  be  given 
by  author(s)  name  and  year  in  the  body  of  the  paper 
(with  alphabetical  listing  of  all  references  at  the  end  of 
the  paper).  Titles  of  Journals,  monographs,  and  similar 
publications  should  be  in  italic  font  or  should  be 
underlined.  Titles  of  papers  or  articles  should  be  in 
quotation  marks. 

5.  Internal  consistency  shall  also  be  maintained  for  other 
elements  of  style,  such  as  equation  numbering.  As  a 
guideline  for  authors  who  have  no  other  preference,  we 
suggest  that  equation  numbers  be  placed  in  parentheses 
at  the  right  column  margin. 

6.  The  intent  and  meaning  of  all  text  must  be  clear.  For 
authors  who  are  NOT  masters  of  the  English  language, 
the  ACES  Editorial  Staff  wilf  provide  assistance  with 
grammar  (subject  to  clarity  of  intent  and  meaning). 

7.  Unused  space  should  be  minimized.  Sections  and 
subsections  should  not  normally  begin  on  a  new  page. 

PAPER  FORMAT 

The  preferred  format  for  initial  submission  and  camera-ready 
manuscripts  is  12  point  Times  Roman  font,  single  line  spacing 
and  double  column  format,  similar  to  that  used  here,  with  top, 
bottom,  left,  and  right  1  inch  margins.  Manuscripts  should  be 
prepared  on  standard  8.5x1 1  inch  paper. 

Only  camera-ready  electronic  files  are  accepted  for 
publication.  The  term  “camera-ready”  means  that  the 
material  is  neat,  legible,  and  reproducible.  Full  details  can 
be  found  on  ACES  site,  Journal  section. 


responsibility  to  provide  acceptable  camera-ready  pdf  files. 
Incompatible  or  incomplete  pdf  files  will  not  be  processed, 
and  authors  will  be  requested  to  re-upload  a  revised 
ax:eptable  version. 

SUBMITTAL  PROCEDURE 

All  submissions  should  be  uploaded  to  ACES  server  through 
ACES  web  site  (http://aces.ee.olemiss.edu)  by  using  the 
upload  button,  journal  section.  Only  pdf  files  are  accepted  for 
submission.  The  file  size  should  not  be  larger  than  SMB, 
otherwise  permission  fl-om  the  Editor-in-Chief  should  be* 
obtained  first  The  Editor-in-Chief  will  acknowledge  the 
electronic  submission  after  the  upload  process  is  successfially 
completed. 

COPYRIGHTS  AND  RELEASES 

Each  primary  author  must  sign  a  copyright  form  and  obtain  a 
release  fixrm  his/her  organization  vesting  the  copyri^t  with 
ACES.  Copyright  forms  are  available  at  ACES,  web  site 
(http://aces.ee.olemiss.edu).  To  shorten  the  review  process 
time,  the  executed  copyright  form  should  be  forwarded  to  the 
Editor-in-Chief  immediately  after  the  completion  of  the 
upload  (electronic  submission)  process.  Both  the  author  and 
his/her  organization  are  allowed  to  use  the  copyri^ted 
material  freely  for  their  own  private  purposes. 

Permission  is  gimted  to  quote  short  passages  and  reproduce 
figures  md  tables  from  and  ACES  Journal  issue  provided  the 
source  is  cited.  Copies  of  ACES  Journal  articles  may  be 
made  in  accoidance  with  usage  permitted  by  Sections  107  or 
108  of  the  U.S.  Copyright  Law.  This  consent  does  not  extend 
to  other  kinds  of  copying,  such  as  for  general  distribution,  for 
advertising  or  promotional  purposes,  for  creating  new 
collective  works,  or  for  resale.  The  reproduction  of  multiple 
copies  and  the  use  of  articles  or  extracts  for  commercial 
purposes  require  the  consent  of  the  author  and  specific 
permission  from  ACES.  Institutional  members  are  allowed  to 
copy  any  ACES  Journal  issue  for  their  internal  distribution 
only. 

PUBLICATION  CHARGES 

ACES  members  are  allowed  12  printed  pages  per  paper 
without  charge;  non-members  are  allowed  8  printed  pages  per 
paper  without  charge.  Mandatoiy  page  charges  of  $75  a  page 
apply  to  all  pages  in  excess  of  12  for  members  or  8  for  non- 
members.  Voluntary  page  charges  are  requested  for  the  free 
(12  or  8)  pages,  but  are  NOT  mandatory  or  required  for 
publication.  A  priority  courtesy  guideline,  which  favora 
members,  applies  to  paper  backlogs.  Authors  are  entitled  to 
15  free  reprints  of  their  articles  and  must  request  these  from 
the  Managing  Editor.  Additional  reprints  are  available  to 
authors,  and  reprints  available  to  non-authors,  for  a  nominal 
fee. 

ACES  Journal  is  abstracted  in  INSPEC,  in  Engineering 
Index,  and  in  DTIC. 


ACES  reserves  the  right  to  edit  any  uploaded  material, 
however,  this  is  not  generally  done.  It  is  the  author(s) 


